Closed sahilrishav2 closed 3 months ago
Hi,
Please refer to the single sample SNV for the detailed explanation. In short, MIDAS2 only genotype species meeting the filter specified by --select_by
and --select_threhold
. Users can adjust these two parameter to decide the list of species they want you genotype.
If you want to genotype all the strains in your simulated reads, you can also follow build you own genome index to achieve this goal.
Thanks, Chunyu
Hi,
Actually, In building the own genome index, some error persists, so, I downloaded the GTDB customized by midas2 and those 1400 strains are available in GTDB.
What is your error message? Is it from MIDAS2?
Oh really sorry, I sent you a different error message. Actually, the error i got is not available currently
It was days ago, so, i don't have error report. then i decided to download the GTDB customised by MIDAS2 to perform the analysis.
I want to ask if I am using GTDB and those strains are available in the database then why the tool can detect only a few strains?
Sure. Check the first link I sent to you. That answers your question.
Thank you so much for your support
Hi,
sorry to bother you again. I want to say that even after using this command midas2 run_snps --sample_name synthetic_reads_10M -1 synthetic_reads_10M_R1.fastq -2 synthetic_reads_10M_R2.fastq --midasdb_name gtdb --midasdb_dir /home/rishabh/my_midasdb_gtdb --num_cores 40 --select_by median_marker_coverage,unique_fraction_covered --select_threshold=2,0.5 midas2_output_1355_strains_reads_10M
. Only few strains were detected.
cat snps_summary.tsv
species_id genome_length covered_bases total_depth aligned_reads mapped_reads fraction_covered mean_coverage
100071 7532486 6776814 207976720 1603677 1464870 0.900 30.689
100259 6534206 6330743 47322768 361190 333329 0.969 7.475
100148 6150048 5962769 60054377 607189 422886 0.970 10.072
100272 6028589 5665629 39938776 408227 281253 0.940 7.049
100338 6345256 5342492 32115412 325309 226136 0.842 6.011
100057 5962738 5370418 121980244 961247 859046 0.901 22.713
100250 5076188 4935426 64085233 497484 451340 0.972 12.985
100074 5657766 5330351 37765574 353582 265958 0.942 7.085
100078 5710863 5050473 18761658 180068 132137 0.884 3.715
100048 4990672 4792836 314936606 2388216 2218247 0.960 65.710
100115 3777657 3776607 110315594 783730 776975 1.000 29.210
100153 2607257 2504168 74563049 579688 525211 0.960 29.776
@zhaoc1 , Please suggest some solution to it.
--select_by median_marker_coverage,unique_fraction_covered --select_threshold=2,0.5
This mean you only genotype species meeting these two filters based on the species/species_profile.tsv
. And it is expected to be only a few species.
--select_threshold=-1
and --species_list /path/to/the list of species you want to genoytpe
This might be what you want.
Refer to this link for more details: https://midas2.readthedocs.io/en/latest/species_module.html#abundant-species-selection
Thanks. Chunyu
Hi @zhaoc1 ,
Even after using this command midas2 run_snps --sample_name synthetic_reads_10M -1 synthetic_reads_10M_R1.fastq -2 synthetic_reads_10M_R2.fastq --midasdb_name gtdb --midasdb_dir /home/rishabh/my_midasdb_gtdb --num_cores 40 --select_threshold=-1 midas2_output_1355_strains_reads_10M --species_list 100071
to identify the strains of a single species. It was able to identify only one strain:
cat snps_summary.tsv
species_id genome_length covered_bases total_depth aligned_reads mapped_reads fraction_covered mean_coverage
100071 7532486 6790769 213971382 1760066 1507106 0.902 31.509
Hi,
You specify species-list 100071, and this means you only want to run snps for 100071.
Get a txt file with the list of the species ids you want to genotype and pass the file name to species-list.
Chunyu
Hi @zhaoc1 ,
I was just checking how many strains of species list 100071, tool was able to detect. In my metagenome reads, I have 236 strains of this species list 100071 and those 236 strains are also available in GTDB but the tool could identify only one strain.
MIDAS2 only do genotyping the SNPs profile. If you want to infer strains, it is out of scope of MIDAS2. Try https://github.com/bsmith89/StrainFacts
Ok, thank you for the clarification. I thought that MIDAS2 could detect strains through snv method as in the paper this statement claims so
MIDAS2 performs strain-level metagenomics analysis through a SNV module and a CNV module
Thank you very much for your time
Hello,
I am performing benchmarking of some tools in which I have included MIDAS2. I have performed analysis on my simulated datasets (10M paired-end reads of 1400 strains) out of which it could detect only 40 strains. Could you explain to me the reason behind this? The database I used was GTDB.
Thank you