Genome breadth of coverage

MrOlm / inStrain

Bioinformatics program inStrain

MIT License

134 stars 33 forks source link

Hi.

To determine which genomes present and absence in my samples, I mapped reads from each sample to reference genomes from publish database.

And strain presence inferred by assessing the level of genome breadth of coverage that using inStrain quick_profile. If one genome with 30% breadth of coverage or above ,I conside it was “present” in this sample.

Which parameter I should choose to use (--breadth_cutoff 0.3 or --stringent_breadth_cutoff 0.3）? And what are their differences?

However, when I set --breadth_cutoff 0.3 or --stringent_breadth_cutoff 0.3, I found the inStrain quick_profile outputs (genomesCoverage.csv) the results for all genomes, even if the genome breadth of coverage is below 0.3 at one sample.

Does it mean that I can use all genomes that appear in genomesCoverage.csv ? Or should I need to filter out the genomes with very fewer breadth ( < 0.3) at one sample?

And I am wondering the inStrain quick_profile outputs results meaning (genomesCoverage.csv, coverm_raw.tsv,scaffolds.txt)?

Thanks.

Hi @Liuyuxinn ,

The stringent_breadth_cutoff is used as a pre-filtering step. A quick estimate is run very early in the program to guess at the breadth of each scaffold and only include scaffolds that pass this breadth. It's really only useful to make the program run a little bit faster, but I would strongly recommend keeping this at it's default value since it will probably only run seconds faster anyways/

The breadth_cutoff is only useful if you plan on using the scaffolds.txt output file, as it puts all of the scaffold of genomes with breadth over that threshold in that file.

I recommend leaving both of these values at the default, and then just filtering the output to your preferred breadth cutoff.

The files your listed are the correct outputs.

Best, Matt

MrOlm / inStrain

Genome breadth of coverage #142