czbiohub-sf / MIDAS

Metagenomic Intra-Species Diversity Analysis (MIDAS)
MIT License
35 stars 10 forks source link

Error in building a customised database #129

Closed sahilrishav2 closed 6 months ago

sahilrishav2 commented 6 months ago

Hello,

I am trying to make a customized database for the analysis of my data, but the tool is showing this error: midas2 annotate_genome --species all --midasdb_name newdb --midasdb_dir . --debug --force 1708365906.7: Executing midas2 subcommand annotate_genome. 1708365906.7: Annotating genome GCA_000006965.1 from species 123456. 1708365906.7: Annotating genome GCA_000218265.1 from species 123456. 1708365906.7: Annotating genome GCA_000287375.1 from species 123456. 1708365906.7: Annotating genome GCA_000287415.1 from species 123456. 1708365906.7: Annotating genome GCA_000287435.1 from species 123456. 1708365906.7: Annotating genome GCA_000287475.1 from species 123456. 1708365906.7: Annotating genome GCA_000287455.1 from species 123456. 1708365906.7: Annotating genome GCA_000287495.1 from species 123456. 1708365922.7: Annotating genome GCA_000287535.1 from species 123456. 1708365923.2: Annotating genome GCA_000287555.1 from species 123456. 1708365924.9: Annotating genome GCA_000287575.1 from species 123456. 1708365927.5: Annotating genome GCA_000304415.1 from species 123456. 1708367056.4: Annotating genome GCA_900108935.1 from species 123456. Traceback (most recent call last): File "/home/rishabh/anaconda3/envs/midas4.0.0/bin/midas2", line 8, in <module> sys.exit(main()) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/__main__.py", line 25, in main return subcommand_main(subcommand_args) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/subcommands/annotate_genome.py", line 199, in main annotate_genome(args) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/subcommands/annotate_genome.py", line 43, in annotate_genome annotate_genome_master(args) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/subcommands/annotate_genome.py", line 115, in annotate_genome_master multithreading_map(genome_work, genome_id_list, num_threads=CONCURRENT_PROKKA_RUNS) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/common/utils.py", line 540, in multithreading_map return _multi_map(func, items, num_threads, ThreadPool) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/common/utils.py", line 520, in _multi_map return p.map(func, items, chunksize=1) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/subcommands/annotate_genome.py", line 95, in genome_work command(worker_cmd) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages/midas2/common/utils.py", line 246, in command return subprocess.run(cmd, shell=shell, **subproc_args) File "/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'cd /home/rishabh/midas_database/gene_annotations/123456/GCA_000287415.1; PYTHONPATH=/home/rishabh/anaconda3/envs/midas4.0.0/lib/python3.9/site-packages /home/rishabh/anaconda3/envs/midas4.0.0/bin/python3.9 -m midas2 annotate_genome --genome GCA_000287415.1 --zzz_worker_mode --midasdb_name newdb --midasdb_dir /home/rishabh/midas_database --debug &>> /home/rishabh/midas_database/gene_annotations/123456/GCA_000287415.1/annotate_genome.log' returned non-zero exit status 1. I am unable to sort out this problem, please help me out. It's urgent. Thank you in advance.

zhaoc1 commented 6 months ago

Have you tried the unit testing for building custom database, something like bash tests/test_db.sh 8?

What does the log file say?

sahilrishav2 commented 6 months ago

yaa, I tried this "bash tests/test_db.sh 8", those all worked.

The log file says error in downloading and decompressing the fasta files, then I looked for those fasta files in NCBI and I found that those files have been removed by RefSeq staff.

sahilrishav2 commented 6 months ago

Now, I decided to download the reference gtdb database made by you all, but i am unable to download that also, here is the code i used to download the files midas2 database --download --midasdb_name gtdb --midasdb_dir my_midasdb_gtdb --species all and the error is this: Traceback (most recent call last): File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/__main__.py", line 29, in <module> main() File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/__main__.py", line 25, in main return subcommand_main(subcommand_args) File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/subcommands/database.py", line 148, in main download_midasdb(args) File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/subcommands/database.py", line 37, in download_midasdb download_midasdb_worker(args) File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/subcommands/database.py", line 83, in download_midasdb_worker midasdb = MIDAS_DB(os.path.abspath(args.midasdb_dir), args.midasdb_name, 4) #<--- 4 way concurrency File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/models/midasdb.py", line 130, in __init__ self.md5sum = load_json(self.fetch_files("md5sum")) if self.has_md5sum else None File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/models/midasdb.py", line 182, in fetch_files return self.fetch_tarball(filename, list_of_species) File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages/midas2/models/midasdb.py", line 256, in fetch_tarball assert md5_fetched == md5_lookup, f"Error for downloadding {_fetched_file} from {filename}. Please delete the folder and redownload." AssertionError: Error for downloadding /home/rishabh/my_midasdb_gtdb/md5sum.json from md5sum. Please delete the folder and redownload. Traceback (most recent call last): File "/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, subprocess.CalledProcessError: Command 'PYTHONPATH=/home/rishabh/anaconda3/envs/midas2.0/lib/python3.9/site-packages /home/rishabh/anaconda3/envs/midas2.0/bin/python3.9 -m midas2 database --download -s 42:64 --midasdb_name gtdb --midasdb_dir my_midasdb_gtdb --zzz_worker_mode ' returned non-zero exit status 1

sahilrishav2 commented 6 months ago

Now, i can download the gtdb. Thank You