Dennis-xyHuang / PhyloPlus

MIT License
2 stars 0 forks source link

Problem with Download and Summarize NCBI Dump Files #2

Open Da-Ryung opened 1 year ago

Da-Ryung commented 1 year ago

Hello

I have a problem with download and summarize NCBI Dump files as below:

(base) amugae1210@pc101:~/tools/PhyloPlus$ ./phyloplus.sh -m download cd /home1/amugae1210/tools/PhyloPlus mkdir -p ./NCBI_dmp_file Downloading taxonomy dump files from NCBI ftp server... Downloading genome assembly summary files from NCBI ftp server... python ./scripts/summarize_dmp_files.py /home1/amugae1210/tools/PhyloPlus/NCBI_dmp_file Processing genome assembly summary reports from both GenBank and RefSeq sources... Traceback (most recent call last): File "/home1/amugae1210/tools/PhyloPlus/scripts/summarize_dmp_files.py", line 20, in temp_assembly_df = temp_assembly_df[["# assembly_accession", "species_taxid"]] File "/home1/amugae1210/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3811, in getitem indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/home1/amugae1210/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home1/amugae1210/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6176, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['# assembly_accession'] not in index" Removes intermediate files...

I hope someone can help me about this error. Thank you.

Dennis-xyHuang commented 1 year ago

Hi Da-Ryung,

Thank you for reporting this issue. NCBI appears to have made a small change to the header for their summary files. I pushed a fix for this issue to the repository and the code should be working properly now.

We will also implement this change and update the web server at https://phylo.jifsan.org so that it's up-to-date. If the issue persists on your local version (hopefully not), please let me know, and in the meanwhile you can use the updated web version as an alternative : )

Best,

Dennis