fasta support? - Githubissues

brian-cleary / LatentStrainAnalysis

Partitioning and analysis methods for large, complex sequence datasets

MIT License

37 stars 20 forks source link

fasta support? #3

Open scottdaniel opened 8 years ago

scottdaniel commented 8 years ago

I e-mailed the senior author of your paper and haven't heard back so I'm asking here... 1) Is there a way to run LSA on fasta files instead of fastq? 2) Does LSA use the quality information in the fastq files?

brian-cleary commented 8 years ago

Hi Scott,

Currently there is no support for fasta files. LSA does indeed use the quality information. If you must use the fasta files, you can introduce mock quality scores and create fastq files by giving them all high quality, for example.

On Wed, Oct 21, 2015 at 1:50 AM, Scott Daniel notifications@github.com wrote:

I e-mailed the senior author of your paper and haven't heard back so I'm asking here... 1) Is there a way to run LSA on fasta files instead of fastq? 2) Does LSA use the quality information in the fastq files?

— Reply to this email directly or view it on GitHub https://github.com/brian-cleary/LatentStrainAnalysis/issues/3.

scottdaniel commented 8 years ago

Ok, the reason why I want to use fasta is that I have already done QC on my fastq, split into smaller fastas and filtered for the host genome / host food (they are mice cecal matter).

Thanks for answering my question.

scottdaniel commented 8 years ago

By the way, how does LSA use the quality scores? Which scripts use them?

brian-cleary commented 8 years ago

The quality scores are incorporated when hashing the reads. So a low quality base will hash more like an ambiguous character. This is done in fastq_reader.py, I believe.

On Thu, Oct 22, 2015 at 6:18 PM, Scott Daniel notifications@github.com wrote:

By the way, how does LSA use the quality scores? Which scripts use them?

— Reply to this email directly or view it on GitHub https://github.com/brian-cleary/LatentStrainAnalysis/issues/3#issuecomment-150371807 .