Closed ramnageena11 closed 10 months ago
Hello Ram, I am not sure what is happening here, do you have a final summary file generated? Could you please provide the data you are trying to analyse? How complete/contaminated are this genomes?
Thanks, Pierre
Hi pchaumeil, I tried to send the files on email but failed. Here, files are not attaching.
How to share? Thanks Ram
Do you have access to a cloud storage service ( dropbox, onedrive, google drive...)? you can upload your data there and send me the link to download them.
Hi, Yes, I do have google drive and will upload and share.
Thanks
On Thu, Aug 3, 2023 at 16:54 Pierre Chaumeil @.***> wrote:
Do you have access to a cloud storage service ( dropbox, onedrive, google drive...)? you can upload your data there and send me the link to download them.
— Reply to this email directly, view it on GitHub https://github.com/Ecogenomics/GTDBTk/issues/539#issuecomment-1664737065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4LETTO57YKVCDDKBM3EZLXTQT2XANCNFSM6AAAAAA3AOV3DM . You are receiving this because you authored the thread.Message ID: @.***>
-- Ram Nageena Singh, Ph.D (Microbiology) Lab No. 3 Molecular Microbiology Lab Division of Microbiology ICAR-Indian Agricultural Research Institute Pusa, New Delhi-110012, Delhi, India
Hi pls find the link with files and folders https://drive.google.com/drive/folders/18o8vaa3VzF2ZAynFuw4E_jUP4p8Y91Jg?usp=drive_link
I have total 4 metagenome assemblies to be analyzed. Let me know if you need more information.
Thanks rgds Ram
Hello Ram, Looking at the high contamination in CheckM output file and GTDB-Tk output files, it seems you are trying to analyses a complete assembly for the entire sample. it hasn't been binned out into MAGs. This file will not give you any results in GTDB-Tk because all markers are duplicated. I would recommend running a tool like Metabat2 or MaxBin2 to bin your data before running Tk and CheckM.
Hi, Thanks, I'll look into it. For Metabat2 or Maxbin2: This assembly is generated from Nanopore long reads and could not fit in parameters for illumina to bin the sample. Regarding duplicate markers: Do you think, if I run a single genome file it will work? I checked a single genome in Checkm and it resulted in no marker genes, may be due to lower length of MAGs.
Thanks rgds
Ram
Ram Singh, Postdoctoral Research Scientist
Science Communication Fellow (SD Discovery Center)
Karen M. Swindler Department of Chemical & Biological Engineering
South Dakota Mines
501 E. Saint Joseph St., Rapid City, SD 57701
605.394.1730 | @.***
[image: South Dakota Mines] https://www.sdsmt.edu/
[image: South Dakota Mines on Facebook] https://www.facebook.com/SouthDakotaMines/[image: South Dakota Mines on Instagram] https://www.instagram.com/southdakotamines/[image: South Dakota Mines on Twitter] https://twitter.com/sdsmt[image: South Dakota Mines on Snapchat] https://www.snapchat.com/add/sdsmt
On Thu, Aug 3, 2023 at 10:08 PM Pierre Chaumeil @.***> wrote:
Hello Ram, Looking at the high contamination in CheckM output file and GTDB-Tk output files, it seems you are trying to analyses a complete assembly for the entire sample. it hasn't been binned out into MAGs. This file will not give you any results in GTDB-Tk because all markers are duplicated. I would recommend running a tool like Metabat2 or MaxBin2 to bin your data before running Tk and CheckM.
— Reply to this email directly, view it on GitHub https://github.com/Ecogenomics/GTDBTk/issues/539#issuecomment-1664940951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4LETT32JQ5MM7E463JCULXTRYT7ANCNFSM6AAAAAA3AOV3DM . You are receiving this because you authored the thread.Message ID: @.***>
Hi Ram,
CheckM expects prokaryotic genomes (MAGs, SAGs, or isolate) as input. It will estimate the quality of these genomes using single-copy marker genes. Extremely poor quality genomes can result in no marker genes being identified.
Cheers, Donovan
Hi Donovan, No marker genes: does it depend on Genome length also?
I was not able to bin the metagenome as assembly was generated from Nanopore sequencing data and no software available for binning if Nanopore data.
I have extracted 16S rRNA genes from metagenome contigs (using barnap) and used them for BLAST but the same contig did not get identified on GTDB-Tk, though 16S rRNA gene is present. What will be the reason?
Thanks Ram
On Tue, Aug 8, 2023 at 16:50 Donovan H. Parks @.***> wrote:
Hi Ram,
CheckM expects prokaryotic genomes (MAGs, SAGs, or isolate) as input. It will estimate the quality of these genomes using single-copy marker genes. Extremely poor quality genomes can result in no marker genes being identified.
Cheers, Donovan
— Reply to this email directly, view it on GitHub https://github.com/Ecogenomics/GTDBTk/issues/539#issuecomment-1670416456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4LETSNUJDFBBGQVLBV54LXUK7BZANCNFSM6AAAAAA3AOV3DM . You are receiving this because you authored the thread.Message ID: @.***>
Hi Ram,
Identifying genes and thus marker genes depends on the contigs provided to CheckM, but not directly on genome length. Both CheckM and GTDB-Tk assume the input is a genome. Providing individual contigs (unless they represent an appreciable portion of a genome) or full assemblies of a metagenomic sample is NOT recommended.
GTDB-TK does not consider 16S rRNA sequences. I encourage you to read the GTDB-Tk manuscript and documentation to better understand how it operates.
Cheers, Donovan
Hi Donavan,
Yes, My point was same. If the provided contig does not have Marker genes, it will not identify/classify. Since, I have limitations to bin Nanopore Data assembly, what do you suggest ? As I mentioned, I have extracted 16S rRNA as a strategy and used an alternative method to classify MAGs.
Thanks Ram
Ram Nageena Singh, Ph.D (Microbiology) Lab No. 3 Molecular Microbiology Lab Division of Microbiology ICAR-Indian Agricultural Research Institute Pusa, New Delhi-110012, Delhi, India
On Wed, Aug 9, 2023 at 12:05 PM Donovan H. Parks @.***> wrote:
Hi Ram,
Identifying genes and thus marker genes depends on the contigs provided to CheckM, but not directly on genome length. Both CheckM and GTDB-Tk assume the input is a genome. Providing individual contigs (unless they represent an appreciable portion of a genome) or full assemblies of a metagenomic sample is NOT recommended.
GTDB-TK does not consider 16S rRNA sequences. I encourage you to read the GTDB-Tk manuscript and documentation to better understand how it operates.
Cheers, Donovan
— Reply to this email directly, view it on GitHub https://github.com/Ecogenomics/GTDBTk/issues/539#issuecomment-1671901435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4LETSICUD4O7VCQHRJJO3XUPGNLANCNFSM6AAAAAA3AOV3DM . You are receiving this because you authored the thread.Message ID: @.***>
Hi,
Pls suggest. Thanks Ram
G gtdbtk classify_wf --genome_dir /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/s2mags_contigs_16S/ --out_dir /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify --keep_intermediates --cpu 32 -x fasta --mash_db /home/majorram/anaconda3/envs/gtdb-tk-1/share/gtdbtk-2.3.2/db/mash_db.msh [2023-07-27 10:25:52] INFO: GTDB-Tk v2.3.2 [2023-07-27 10:25:52] INFO: gtdbtk classify_wf --genome_dir /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/s2mags_contigs_16S/ --out_dir /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify --keep_intermediates --cpu 32 -x fasta --mash_db /home/majorram/anaconda3/envs/gtdb-tk-1/share/gtdbtk-2.3.2/db/mash_db.msh [2023-07-27 10:25:52] INFO: Using GTDB-Tk reference data version r214: /home/majorram/anaconda3/envs/gtdb-tk-1/share/gtdbtk-2.3.2/db [2023-07-27 10:25:53] INFO: Loading reference genomes. [2023-07-27 10:25:53] INFO: Using Mash version 2.3 [2023-07-27 10:25:53] INFO: Creating Mash sketch file: /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh [2023-07-27 10:26:01] INFO: Completed 49 genomes in 7.85 seconds (6.24 genomes/second). [2023-07-27 10:26:01] INFO: Loading data from existing Mash sketch file: /home/majorram/anaconda3/envs/gtdb-tk-1/share/gtdbtk-2.3.2/db/mash_db.msh [2023-07-27 10:26:06] INFO: Calculating Mash distances. [2023-07-27 10:26:17] INFO: Calculating ANI with FastANI v1.32. [2023-07-27 10:26:28] INFO: Completed 196 comparisons in 10.95 seconds (17.90 comparisons/second). [2023-07-27 10:26:33] INFO: Summary of results saved to: /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/ani_screen/gtdbtk.bac120.ani_summary.tsv [2023-07-27 10:26:33] INFO: 2 genome(s) have been classified using the ANI pre-screening step. [2023-07-27 10:26:33] INFO: Done. [2023-07-27 10:26:33] INFO: Identifying markers in 47 genomes with 32 threads. [2023-07-27 10:26:33] TASK: Running Prodigal V2.6.3 to identify genes. [2023-07-27 10:26:45] INFO: Completed 47 genomes in 12.07 seconds (3.89 genomes/second). [2023-07-27 10:26:45] TASK: Identifying TIGRFAM protein families.
[2023-07-27 10:26:55] INFO: Completed 47 genomes in 9.36 seconds (5.02 genomes/second). [2023-07-27 10:26:55] TASK: Identifying Pfam protein families.
[2023-07-27 10:26:56] INFO: Completed 47 genomes in 0.67 seconds (70.11 genomes/second). [2023-07-27 10:26:56] INFO: Annotations done using HMMER 3.3.2 (Nov 2020).
[2023-07-27 10:26:56] TASK: Summarising identified marker genes. [2023-07-27 10:26:56] INFO: Completed 47 genomes in 0.16 seconds (294.50 genomes/second). [2023-07-27 10:26:56] INFO: Done. [2023-07-27 10:26:56] INFO: Aligning markers in 47 genomes with 32 CPUs. [2023-07-27 10:26:56] INFO: Processing 47 genomes identified as bacterial. [2023-07-27 10:27:03] INFO: Read concatenated alignment for 80,789 GTDB genomes. [2023-07-27 10:27:03] TASK: Generating concatenated alignment for each marker. [2023-07-27 10:27:06] INFO: Completed 47 genomes in 0.04 seconds (1,168.61 genomes/second). [2023-07-27 10:27:06] TASK: Aligning 116 identified markers using hmmalign 3.3.2 (Nov 2020). [2023-07-27 10:27:11] INFO: Completed 116 markers in 1.58 seconds (73.23 markers/second). [2023-07-27 10:27:11] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2023-07-27 10:28:48] INFO: Completed 80,816 sequences in 1.61 minutes (50,227.78 sequences/minute). [2023-07-27 10:28:48] INFO: Masked bacterial alignment from 41,084 to 5,035 AAs. [2023-07-27 10:28:48] INFO: 22 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-07-27 10:28:48] INFO: Creating concatenated alignment for 80,794 bacterial GTDB and user genomes. [2023-07-27 10:29:19] INFO: Creating concatenated alignment for 5 bacterial user genomes. [2023-07-27 10:29:20] INFO: Done. [2023-07-27 10:29:20] TASK: Placing 5 bacterial genomes into backbone reference tree with pplacer using 32 CPUs (be patient). [2023-07-27 10:29:20] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 ==> Running pplacer v1.1.alpha19-0-g807f6f3 analysis on /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/intermediate_results/gtdbtk.bac1==> Step 2 of 9: Pre-masking sequences. [2023-07-27 10:31:19] INFO: Calculating RED values based on reference tree.
[2023-07-27 10:31:20] INFO: 5 out of 5 have an class assignments. Those genomes will be reclassified. [2023-07-27 10:31:20] TASK: Placing 3 bacterial genomes into class-level reference tree 1 (1/3) with pplacer using 32 CPUs (be patient). ==> Running pplacer v1.1.alpha19-0-g807f6f3 analysis on /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/intermediate_results/pplacer/tre==> Step 2 of 9: Pre-masking sequences. [2023-07-27 10:37:13] INFO: Calculating RED values based on reference tree.
[2023-07-27 10:37:23] TASK: Traversing tree to determine classification method. [2023-07-27 10:37:23] INFO: Completed 3 genomes in 0.00 seconds (4,503.55 genomes/second). [2023-07-27 10:37:23] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2023-07-27 10:37:26] INFO: Completed 120 comparisons in 2.99 seconds (40.14 comparisons/second). [2023-07-27 10:37:27] INFO: 0 genome(s) have been classified using FastANI and pplacer.
[2023-07-27 10:37:27] TASK: Placing 1 bacterial genomes into class-level reference tree 3 (2/3) with pplacer using 32 CPUs (be patient). ==> Running pplacer v1.1.alpha19-0-g807f6f3 analysis on /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/intermediate_results/pplacer/tre==> Step 2 of 9: Pre-masking sequences. [2023-07-27 10:38:54] INFO: Calculating RED values based on reference tree.
[2023-07-27 10:38:58] TASK: Traversing tree to determine classification method. [2023-07-27 10:39:04] INFO: Completed 1 genome in 0.00 seconds (7,667.83 genomes/second). [2023-07-27 10:39:05] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2023-07-27 10:39:05] INFO: Completed 2 comparisons in 0.63 seconds (3.16 comparisons/second). [2023-07-27 10:39:05] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2023-07-27 10:39:06] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (3/3) with pplacer using 32 CPUs (be patient). ==> Running pplacer v1.1.alpha19-0-g807f6f3 analysis on /media/majorram/Analysis_Data/singhrn/medaka_final2/s2mags_genomes/gtdb-tk-S2mags_classify/classify/intermediate_results/pplacer/tre==> Step 2 of 9: Pre-masking sequences. [2023-07-27 10:40:07] INFO: Calculating RED values based on reference tree.
[2023-07-27 10:40:09] TASK: Traversing tree to determine classification method. [2023-07-27 10:40:09] INFO: Completed 1 genome in 0.00 seconds (3,782.06 genomes/second). [2023-07-27 10:40:09] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2023-07-27 10:40:10] INFO: Completed 20 comparisons in 0.79 seconds (25.39 comparisons/second). [2023-07-27 10:40:10] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2023-07-27 10:40:17] WARNING: 43 of 27 genomes have a warning (see summary file). [2023-07-27 10:40:17] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. [2023-07-27 10:40:17] INFO: Done.