WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

Error DRAM.py distilll #186

Closed sixvable closed 2 years ago

sixvable commented 2 years ago

Got an error during DRAM.oy distill

Traceback (most recent call last):
  File "/home/djshen/miniconda3/envs/dram/bin/DRAM.py", line 189, in <module>
    args.func(**args_dict)
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 582, in summarize_genomes
    genome_summary_form = pd.read_csv(database_handler.dram_sheet_locs['genome_summary_form'], sep='\t')
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 581, in _read
    return parser.read(nrows)
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1255, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/djshen/miniconda3/envs/dram/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 225, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 1315, saw 7

Platform: Ubuntu 20.04 LTS DRAM version: 1.3.5

DRAM confirgure: Processed search databases KEGG db: /ssd/database_fast/dram_20220605/kegg.20220611.mmsdb KOfam db: /ssd/database_fast/dram_20220605/kofam_profiles.hmm KOfam KO list: /ssd/database_fast/dram_20220605/kofam_ko_list.tsv UniRef db: /ssd/database_fast/dram_20220605/uniref90.20220621.mmsdb Pfam db: /ssd/database_fast/dram_20220605/pfam.mmspro dbCAN db: /ssd/database_fast/dram_20220605/dbCAN-HMMdb-V10.txt RefSeq Viral db: /ssd/database_fast/dram_20220605/refseq_viral.20220621.mmsdb MEROPS peptidase db: /ssd/database_fast/dram_20220605/peptidases.20220611.mmsdb VOGDB db: /ssd/database_fast/dram_20220605/vog_latest_hmms.txt

Descriptions of search database entries
Pfam hmm dat: /ssd/database_fast/dram_20220605/Pfam-A.hmm.dat.gz
dbCAN family activities: /ssd/database_fast/dram_20220605/CAZyDB.07292021.fam-activities.txt
VOG annotations: /ssd/database_fast/dram_20220605/vog.annotations.tsv.gz

Description db: /ssd/database_fast/dram_20220605/description_db.sqlite

DRAM distillation sheets
Genome summary form: /ssd/database_fast/dram_20220605/genome_summary_form.tsv
Module step form: /ssd/database_fast/dram_20220605/module_step_form.tsv
ETC module database: /ssd/database_fast/dram_20220605/etc_module_database.tsv
Function heatmap form: /ssd/database_fast/dram_20220605/function_heatmap_form.tsv
AMG database: /ssd/database_fast/dram_20220605/amg_database.tsv

Command:

DRAM.py annotate \
-i genome/coxi_genome.fasta \
-o genome_annotation/dram/ \
--prodigal_mode single \
--gtdb_taxonomy gtdbtk/coxi.bac120.summary.tsv \
--checkm_quality genome/checkm/genome_checkm_results.tsv \
--use_vogdb --use_uniref \
--verbose --threads 16

DRAM.py distill -i genome_annotation/dram/annotations.tsv --rrna_path genome_annotation/dram/rrnas.tsv --trna_path genome_annotation/dram/trnas.tsv -o genome_annotation/dram/distill

log file dram.tar.gz original genome coxi_genome.txt

Hope this issue can be solved! 😄

rmFlynn commented 2 years ago

Hi, sorry for the delay, I don't know how, but your distillate forms seem to have been corrupted. Try DRAM-setup.py update_dram_forms --output_dir /ssd/database_fast/dram_20220605/

sixvable commented 2 years ago

Try DRAM-setup.py update_dram_forms --output_dir /ssd/database_fast/dram_20220605/

Updated the distillate froms and reinstalled DRAM through conda but still get the same error! 😭

rmFlynn commented 2 years ago

The only thing I can think is that your pandas is crazy old or crazy new, what is its version? I can't seem to reproduce the error with your annotations you may as well also upload your Genome summary form, it looks like it is here "/ssd/database_fast/dram_20220605/genome_summary_form.tsv". Thanks!

Here is the distillate I made, does it look ok? distill.zip

sixvable commented 2 years ago

I have solved the problem. I found that I used root to update the distillate forms but my own account's dram configure was not been updated so I still used the old wrong distillate forms. Really appreciated for the the patient responses.