WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
253 stars 52 forks source link

ValueError("No objects to concatenate") in kofam annotation #27

Closed almutwerner closed 4 years ago

almutwerner commented 4 years ago

Hi! I use DRAM in an conda environment, this is my command line:

DRAM.py annotate -i '/home/awerner/OMZ/7494_fa/*.fa' -o /home/awerner/OMZ/DRAM/annotation_out --threads 10

For one of the sequences, I got the error copied below and the script aborted. The sequences before worked just fine. It might has to do something with the sequence not having a hit against the kofam db?

2:13:30.415099: Annotating S13_NODE_1831_length_3495_cov_8.033140 2:13:30.504309: Turning genes from prodigal to mmseqs2 db 2:13:40.648357: Getting hits from kofam Traceback (most recent call last): File "/home/awerner/.conda/envs/DRAM/bin/DRAM.py", line 145, in args.func(**args_dict) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 963, in annotate_bins_cmd annotate_bins(fasta_locs, output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold, File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 1000, in annotate_bins all_annotations = annotate_fastas(fasta_locs, output_dir, db_locs, db_handler, min_contig_size, prodigal_mode, File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 919, in annotate_fastas annotations_list.append(annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_locs, db_handler, min_contig_size, File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 820, in annotate_fasta annotations = annotate_orfs(gene_faa, db_locs, tmp_dir, start_time, db_handler, custom_db_locs, bit_score_threshold, File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 731, in annotate_orfs annotation_list.append(run_hmmscan_kofam(gene_faa, db_locs['kofam'], tmp_dir, File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 253, in run_hmmscan_kofam ko_hits_sig = pd.concat(is_sig) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 274, in concat op = _Concatenator( File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 331, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

I also got some notifications, that no tRNAs or rRNAs were detected, I don't know if that is related.

filtered_fasta.fa.gz kofam_profile.b6.gz

shafferm commented 4 years ago

Hello,

No tRNAs or rRNAs isn't related. I haven't seen no hits to KOfam happen so far! I have implemented a fix and pushed it to the master branch. Are these contigs/bins from very uncharacterized communities?

If you want to test this fix now then you can save your config file, update your DRAM installation from the master branch of this repo and then reimport your config file. Instructions on how to do this are here: https://github.com/shafferm/DRAM/wiki/5.-Managing,-updating-or-moving-a-DRAM-installation-and-databases#updating-your-dram-installation. Otherwise I will update this issue when a new version with the fix has been released. To get around this problem without updating I would recommend dropping this contig but throwing away data is never nice.

Mike

almutwerner commented 4 years ago

Hey Mike,

thank you so much for your quick fix, it worked like a charm! The contigs are indeed rather uncharacterized. They are marine phage contigs from a OMZ.

While testing if the fix worked, I also found another issue with the distill function. It throws an error and aborts, because there are no tRNAs or rRNAs in my file. It's not a big deal, I just wanted to bring this to your attention.

DRAM.py distill -i /home/awerner/OMZ/DRAM/annotation_out/annotations.tsv -o /home/awerner/OMZ/DRAM/genome_summaries --trna_path /home/awerner/OMZ/DRAM/genome_summaries/trnas.tsv --rrna_path /home/awerner/OMZ/DRAM/genome_summaries/rrnas.tsv

Error: Traceback (most recent call last): File "/home/awerner/.conda/envs/DRAM/bin/DRAM.py", line 145, in args.func(args_dict) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/mag_annotator/summarize_genomes.py", line 551, in summarize_genomes trna_frame = pd.read_csv(trna_path, sep='\t') File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/io/parsers.py", line 936, in init self._make_engine(self.engine) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/io/parsers.py", line 1168, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/awerner/.conda/envs/DRAM/lib/python3.8/site-packages/pandas/io/parsers.py", line 1998, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/home/awerner/OMZ/DRAM/genome_summaries/trnas.tsv'

Have a lovely day! Almut

shafferm commented 4 years ago

Glad the fix worked! Makes sense that the error happened with less well understood genes.

The --trna_path and --rrna_path arguments are looking for the rRNAs.tsv and tRNAs.tsv files that were generated during annotation. So you need to set those to /home/awerner/OMZ/DRAM/annotation_out/tRNAs.tsv for --trna_path and /home/awerner/OMZ/DRAM/annotation_out/rRNAs.tsv for --rrna_path. If you don't have tRNAs.tsv and/or rRNAs.tsv files in your annotation_out folder then that means none were found during annotation and you don't need to pass those files to the distill command. They aren't required inputs so there is no problem if they aren't given to distill. And we don't expect viruses to have any/many rRNAs or tRNAs, so no output there would make sense.

Mike

almutwerner commented 4 years ago

Ah, that makes totally sense for the RNAs. I will adjust the paths once my dataset is done being annotated. Thank you so much for your help! Almut