WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
252 stars 52 forks source link

DRAM-v.py annotate Error #15

Closed Thexiyang closed 4 years ago

Thexiyang commented 4 years ago

Thanks for the very nice tool! I performed the following command and came across an issue as below. DRAM-v.py annotate -i viral.fasta -v VIRSorter_affi-contigs.tab -o test-cat1-annotation --min_contig_size 1000 --use_uniref --skip_trnascan --verbose --threads 20 Could you help on this matter? Thanks!

1:53:35.108726: Annotations complete, processing annotations 1:53:35.234632: Annotations complete, processing annotations /DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dram_genes['start_position'] = dram_genes['start_position'].astype(int) /DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py:129: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dram_genes['end_position'] = dram_genes['end_position'].astype(int) /DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py:133: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy virsorter_genes['start_position'] = virsorter_genes['start_position'].astype(int) /DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py:134: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy virsorter_genes['end_position'] = virsorter_genes['end_position'].astype(int) Traceback (most recent call last): File "/DRAM/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'KO'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/DRAM/bin/DRAM-v.py", line 111, in args.func(**args_dict) File "/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py", line 382, in annotate_vgfs amgs = get_amg_ids(amg_database_frame) File "/DRAM/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py", line 293, in get_amg_ids ko_amgs = {j.strip() for i in amg_frame['KO'].dropna() for j in i.strip().split(';')} File "/DRAM/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in getitem indexer = self.columns.get_loc(key) File "/DRAM/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'KO'

shafferm commented 4 years ago

Hello,

Can you send me the results of running DRAM-setup.py print_config. It looks to me like something is wrong with your amg_database.tsv file which is listed as the AMG database. If the location of AMG database is set then can you look at that file and see if there is a header called KO in that file.

Thexiyang commented 4 years ago

Indeed. It is the issue of AMG database. The file is not right. I cannot use wget to download 'https://raw.githubusercontent.com' due to the collection issue, so I changed to link 'https://github.com/shafferm/DRAM/blob/master/data/amg_database.tsv'. However, the produced AMG file is totally different. Strange. Could you also add the flag for AMG database location in DRAM-setup.py prepare_databases, like others, e.g. --etc_module_database_loc for ETC_MODULE_DATABASE_LOC? Thanks!

shafferm commented 4 years ago

Sorry for the late reply. I was on vacation last week.

I think that using wget on the page directly will get the html for that page and not the raw file. Did you use this command to get the file from raw wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv.

--amg_database has been added and will be included in a new release. Thanks for catching that! Also if you want to update the DRAM tsv files automatically and separate from the databases then you can use the command DRAM-setupy.py update_dram_forms --output_dir {your_DRAM_db_folder}.