JoseBlanca / seq_crumbs

Little sequence file utilities meant to work within Unix pipelines
Other
37 stars 10 forks source link

sff_extract to support amplicon reads #6

Open fangly opened 10 years ago

fangly commented 10 years ago

When using sff_extract (seq_crumbs 0.1.8) on SFF files that contain amplicon reads, I get the warning:


WARNING: weird sequences in file /srv/whitlam/bio/data/pyrotags/raw/Gasket67/Gasket67.sff After applying left clips too many reads start with: A This does not look sane. [...]


In my case, since the reads are not shotgun but amplicon, I do expect many reads to start with the same nucleotide. Would it be possible to add a flag called --amplicon to inform sff_extract that the input contains amplicon sequences and to not display this warning?

Thanks,

Florent

fangly commented 10 years ago

I now realize that the --max_percentage does exactly this, though I did not understand its meaning when initially reading the help page.

I suggest that you explain exactly what this option does in the help page, and mention that this is judicious to use -- max_percentage 100 when processing SFF files containing amplicon reads.

Best,

Florent

JoseBlanca commented 10 years ago

You right, we should write a manual.

StuntsPT commented 10 years ago

@fangly: Sorry about the poor explanation. I wrote that option myself and submitted it to sff_extract. It was meant to be used with a lower value than the default 50% in shotgun reads. I could never find a good way to explain what it does to someone who never used sff_extract before... But if you have a better suggestion on how to explain the option - I really would like to hear it, because I just can't seem to come up with a better one...