GenomicsAotearoa / metagenomics_summer_school

Course materials for the Genomics Aotearoa Metagenomics Summer School, to be hosted at the University of Auckland in December
https://genomicsaotearoa.github.io/metagenomics_summer_school/
GNU General Public License v3.0
51 stars 29 forks source link

Upgrade DRAM #28

Closed DininduSenanayake closed 1 year ago

DininduSenanayake commented 2 years ago

Upgrade DRAM : See whether we can have it available as a module and store the database on /opt/nesi/db.

DininduSenanayake commented 1 year ago

@mlhoggard @JSBoey

I have ran the 2021 dataset with the module and databases

module purge
module load DRAM/1.3.5-Miniconda3

DRAM.py annotate -i '/nesi/nobackup/nesi99999/Dinindu/ZDissues/DRAM/10.gene_annotation/predictions/*.filtered.fna' \
--checkm_quality ./DRAM_input_files/checkm.txt \
--gtdb_taxonomy ./DRAM_input_files/gtdbtk.bac120.classification_pplacer.tsv \
-o annotation_dram
10 fastas found
2022-09-27 21:08:07.425787: Annotation started
0:00:00.010126: Retrieved database locations and descriptions
0:00:00.010159: Annotating bin_2.filtered
0:00:11.791169: Turning genes from prodigal to mmseqs2 db
0:00:15.168229: Getting hits from kofam
0:20:00.963188: Getting forward best hits from peptidase
0:20:48.065156: Getting reverse best hits from peptidase
0:20:50.957036: Getting descriptions of hits from peptidase
0:20:52.987563: Getting hits from pfam
0:22:25.164596: Getting hits from dbCAN
0:22:39.870252: Merging ORF annotations
0:23:10.327467: Annotating bin_3.filtered
0:23:19.561439: Turning genes from prodigal to mmseqs2 db
0:23:22.726461: Getting hits from kofam
0:51:41.803827: Getting forward best hits from peptidase
0:54:20.144962: Getting reverse best hits from peptidase
0:54:22.091614: Getting descriptions of hits from peptidase
0:54:22.311586: Getting hits from pfam
0:54:52.395768: Getting hits from dbCAN
0:55:06.018509: Merging ORF annotations
0:55:19.145917: Annotating bin_6.filtered
0:56:15.289438: Turning genes from prodigal to mmseqs2 db
0:56:18.469674: Getting hits from kofam
1:16:01.543374: Getting forward best hits from peptidase
1:16:41.862112: Getting reverse best hits from peptidase
1:16:44.242180: Getting descriptions of hits from peptidase
1:16:44.343257: Getting hits from pfam
1:17:12.361333: Getting hits from dbCAN
1:17:23.749997: Merging ORF annotations
1:17:36.258055: Annotating bin_5.filtered
1:18:19.555170: Turning genes from prodigal to mmseqs2 db
1:18:23.042739: Getting hits from kofam
1:58:59.237550: Getting forward best hits from peptidase
2:00:24.073617: Getting reverse best hits from peptidase
2:00:26.639439: Getting descriptions of hits from peptidase
2:00:26.654719: Getting hits from pfam
2:01:17.012038: Getting hits from dbCAN
2:01:41.626553: Merging ORF annotations
2:02:11.866737: Annotating bin_4.filtered
2:02:44.305314: Turning genes from prodigal to mmseqs2 db
2:02:47.498150: Getting hits from kofam
2:20:09.827544: Getting forward best hits from peptidase
2:20:49.761983: Getting reverse best hits from peptidase
2:20:51.751197: Getting descriptions of hits from peptidase
2:20:51.758468: Getting hits from pfam
2:21:19.469995: Getting hits from dbCAN
2:21:31.340825: Merging ORF annotations
2:21:43.847936: Annotating bin_1.filtered
2:21:50.559360: Turning genes from prodigal to mmseqs2 db
2:21:53.693719: Getting hits from kofam
2:35:00.530102: Getting forward best hits from peptidase
2:35:25.014360: Getting reverse best hits from peptidase
2:35:26.851579: Getting descriptions of hits from peptidase
2:35:26.857354: Getting hits from pfam
2:35:50.375718: Getting hits from dbCAN
2:35:58.636778: Merging ORF annotations
2:36:07.613436: Annotating bin_8.filtered
2:36:40.557629: Turning genes from prodigal to mmseqs2 db
2:36:43.733567: Getting hits from kofam
2:52:17.062601: Getting forward best hits from peptidase
2:52:52.539920: Getting reverse best hits from peptidase
2:52:54.499947: Getting descriptions of hits from peptidase
2:52:54.516884: Getting hits from pfam
2:53:22.130619: Getting hits from dbCAN
2:53:33.435468: Merging ORF annotations
2:53:43.496087: Annotating bin_9.filtered
2:54:28.957477: Turning genes from prodigal to mmseqs2 db
2:54:32.242714: Getting hits from kofam
3:21:16.575826: Getting forward best hits from peptidase
3:22:09.926156: Getting reverse best hits from peptidase
3:22:12.038791: Getting descriptions of hits from peptidase
3:22:12.093994: Getting hits from pfam
3:22:46.216410: Getting hits from dbCAN
3:23:02.893923: Merging ORF annotations
3:23:20.891607: Annotating bin_7.filtered
3:23:57.393779: Turning genes from prodigal to mmseqs2 db
3:24:00.581757: Getting hits from kofam
3:42:16.390080: Getting forward best hits from peptidase
3:43:00.994574: Getting reverse best hits from peptidase
3:43:02.982096: Getting descriptions of hits from peptidase
3:43:03.018580: Getting hits from pfam
3:43:33.362493: Getting hits from dbCAN
3:43:47.126957: Merging ORF annotations
3:44:09.227178: Annotating bin_0.filtered
3:44:16.725190: Turning genes from prodigal to mmseqs2 db
3:44:19.923438: Getting hits from kofam
4:08:15.934225: Getting forward best hits from peptidase
4:08:58.152196: Getting reverse best hits from peptidase
4:09:00.064520: Getting descriptions of hits from peptidase
4:09:00.072982: Getting hits from pfam
4:09:32.307946: Getting hits from dbCAN
4:09:46.452562: Merging ORF annotations
4:10:00.655341: Annotations complete, processing annotations

However, it did trigger this error. I have a feeling a threshold or somesorts is not compatible with the latest DRAM. Should be an easy fix in an upstream step.

/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/mag_annotator/annotate_bins.py:603: UserWarning: No rRNAs were detected, no rrnas.tsv file will be created.
  warnings.warn('No rRNAs were detected, no rrnas.tsv file will be created.')
Traceback (most recent call last):
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'classification'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/bin/DRAM.py", line 189, in <module>
    args.func(**args_dict)
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1040, in annotate_bins_cmd
    annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold,
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1092, in annotate_bins
    taxonomy.append(gtdb_taxonomy.loc[i, 'classification'])
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/pandas/core/indexing.py", line 960, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/pandas/core/frame.py", line 3615, in _get_value
    series = self._get_item_cache(col)
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/pandas/core/frame.py", line 3931, in _get_item_cache
    loc = self.columns.get_loc(item)
  File "/opt/nesi/CS400_centos7_bdw/DRAM/1.3.5-Miniconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'classification'
mlhoggard commented 1 year ago

Thanks @DininduSenanayake !

I'll give it a test run with another data set as well, but I don't think that rRNA error is anything to worry about. rRNA often doesn't assemble that well from short reads anyway, so it's not uncommon for DRAM not to detect any. As long as the rest of the annotation process looks like it worked as normal, then I suspect that's all that warning/KeyError is about.

DininduSenanayake commented 1 year ago

DRAM/1.3.5-Miniconda3 is definitely working. (Confirmed by Otago microbio group as well). Therefore, I will mark this as solved for the moment. We can re-open it for any related issues.