benedictpaten / marginAlign

UCSC Nanopore
MIT License
42 stars 13 forks source link

marginStats #26

Open JohnUrban opened 7 years ago

JohnUrban commented 7 years ago

Hi,

Thanks for the marginAlign suite. It is great.

I was wondering if there was a way to have marginStats (or other approach) go through alignments and spit out the read name and identity instead of stats. E.g.:

Read1  0.676691729323   
Read2  0.724295506474   
Read3  0.421308655158   
Read4  0.735033943633   
.
.
.
etc

I tried:

 marginStats --printValuePerReadAlignment --identity output.sam temp.fastq lambdaGenomeSequence.fa

which as you know outputs something like:

AverageIdentity 0.608038446759
MedianIdentity 0.742990654206
MinIdentity 0.015294021428
MaxIdentity 0.809134432064
ValuesIdentity 0.676691729323   0.724295506474  0.421308655158  0.735033943633  0.751218026797  0.750067805804  0.753963593658  0.776012530768  0.757202247895  0.698859626314  0.773887646932  0.809134432064  0.176849567696  0.262163438829  0.740789473684  0.015294021428  0.785557986871  0.787041036717  0.137452502891  0.742990654206  0.356559785803  0.696320949897  0.119196988708  0.0338850369045 0.488094798627  0.787056367432  0.793948126801  0.0889133492326 0.730844793713  0.769590643275  0.756786102063  0.771119294722  0.306422170594  0.069472739201  0.776006539955  0.76849861836   0.714887218045  0.762810678122  0.756357185098  0.769752358491  0.741569492594  0.774193548387  0.770156438026  0.740225118483  0.755319148936  0.745851528384  0.767524401065  0.762261014131  0.787345385347  0.65625 0.704954954955  0.400684931507  0.0273635442305

Are the values here in the same order that reads appear in the SAM file?

best,

John

mitenjain commented 7 years ago

Hi John,

The values are in the same order as reads in the SAM file. We can add a flag to output read names as well if that is useful. Part of the reason for not doing thus far that is to avoid large file sizes (if the read headers were included).

Cheers, Miten

JohnUrban commented 7 years ago

Hi Miten,

If they are in the same order as the SAM file, then it should be easy enough to work with as is.

Nonetheless, if it is easy enough to add an option to get the type output I described above, I'd gladly use it.

best, John