Closed priscalim00 closed 2 weeks ago
Hi Prisca,
Thank you for your kind words about inStrain! I’m glad to hear you’re enjoying using the tool.
Regarding your question about including low-coverage variants in the inStrain compare outputs:
The pooled_SNV_info.tsv file generated by inStrain compare only includes variants detected at positions with at least 5x coverage by default. This is why variants from regions with coverage < 5 are not included in your output, even though the sample_detections and sample_5x_detections columns differentiate between these categories. To include variants from low-coverage regions in the output, you’ll need to adjust the minimum coverage threshold.
Here’s how you can proceed:
Adjust the coverage threshold in inStrain compare: You can set the --min_coverage parameter to 1 when running inStrain compare. This will include positions with at least 1x coverage in the pooled SNV analysis, allowing low-coverage variants to be included in the output.
Profile reprocessing: If you haven’t already, ensure that all the inStrain profile runs were performed with a minimum coverage threshold (-c) of 1 for consistency across samples. If some profiles were run with a higher coverage threshold, you may need to rerun those profiles to include low-coverage variants before using them in inStrain compare.
I also just want to point out that inStrain will be less accurate when using these lower thresholds, so be careful when interpreting the results.
Best, Matt
Hi there! Thank you for developing this powerful tool, I've been having a lot of fun with it.
I have a question about the inStrain compare SNV pooling outputs. I've run inStrain profile on 21 samples with default settings; one of the samples had really low coverage (avg 3.78) for my reference genome of interest, so I reran inStrain profile with
-c 1
to find low coverage variants and recovered many more variants.I then ran inStrain compare, providing the original 21 inStrain profiles and their respective .bam files. Based on the expected output, I was under the impression that this would return all pooled SNVs, including low coverage ones since there is a column for
sample_detections
andsample_5x_detections
. When I looked at the output, I realized that all the positions with coverage < 5 where not included inpooled_SNV_info.tsv
and thus their corresponding variants were not included. This confused me because the expected output descriptions seem to imply that positions with at least one read mapping to it would be included.What's the best way to include these low coverage variants? Should I adjust the min coverage when running the inStrain compare function? Would I need to rerun all my inStrain profiles at a lower min coverage prior to running the inStrain compare command?
Greatly appreciate any input on this!
Best wishes, Prisca