CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
167 stars 37 forks source link

Problems running python metagenomesimulation.py #118

Closed Apurbhasaha closed 2 years ago

Apurbhasaha commented 3 years ago

I was able to download CAMISIN and run successfully the script with the default files. And i also test with this 3times with small data set (4,5,10) and it worked. However when i try with my full data set i have some problems Here i attach my genome_to_id and metadata with csv(because tsv format not support here) format but i used that as tsv format. And my config.ini is as following: [Main] seed=632741178 phase=0 max_processors=8 dataset_id=RL output_directory=/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/out_ssdna2 temp_directory=/tmp gsa=True pooled_gsa=True anonymous=True compress=1

[ReadSimulator] readsim=/gpfs1/data/msb/tools/camisim/CAMISIM-python3/tools/art_illumina-2.3.6/art_illumina error_profiles=/gpfs1/data/msb/tools/camisim/CAMISIM-python3/tools/art_illumina-2.3.6/profiles samtools=/gpfs1/data/msb/tools/camisim/CAMISIM-python3/tools/samtools-1.3/samtools profile=mbarc size=0.1 type=art fragments_size_mean=270 fragment_size_standard_deviation=27

[CommunityDesign]

distribution_file_paths=

ncbi_taxdump=/gpfs1/data/msb/tools/camisim/CAMISIM-python3/tools/taxdump2021.tar.gz strain_simulation_template=/gpfs1/data/msb/tools/camisim/CAMISIM-python3/scripts/StrainSimulationWrapper/sgEvolver/simulation_dir number_of_samples=111

[community0] metadata=/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/metadata_ssdna2.tsv id_to_genome_file=/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/genome_to_id_ssdna2.tsv

id_to_gff_file=

genomes_total=111 genomes_real=111 max_strains_per_otu=1 ratio=1 mode=differential log_mu=1 log_sigma=2 gauss_mu=1 gauss_sigma=1 view=False

My problem is following: (venv-camisim) saha@frontend1 /gpfs1/data/msb/tools/camisim/CAMISIM-python3 $ python3 /gpfs1/data/msb/tools/camisim/CAMISIM-python3/metagenomesimulation.py /gpfs1/data/msb/tools/camisim/CAMISIM-python3/defaults/mini_config_ssdna2.ini 2021-09-02 22:46:41 WARNING: [MetagenomeSimulationPipeline] The output will require approximately 144.00000000000003 GigaByte. 2021-09-02 22:46:41 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting 2021-09-02 22:46:41 INFO: [MetagenomeSimulationPipeline] Validating Genomes 2021-09-02 22:46:41 INFO: [MetadataReader] Reading file: '/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/dsDNA/genome_to_id_ssdna2.tsv' 2021-09-02 22:47:24 INFO: [MetagenomeSimulationPipeline] Design Communities 2021-09-02 22:47:24 INFO: [CommunityDesign] Drawing strains. 2021-09-02 22:47:24 INFO: [MetadataReader 31395689975] Reading file: '/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/ssDNA2/metadata_ssdna2.tsv' 2021-09-02 22:47:24 INFO: [MetadataReader 35519997828] Reading file: '/gpfs1/data/msb/PEOPLE/saha/cami/genome_test/ssDNA2/genome_to_id_ssdna2.tsv' 2021-09-02 22:47:24 INFO: [CommunityDesign] Validating raw sequence files! 2021-09-02 22:47:44 INFO: [NcbiTaxonomy] Building taxonomy tree... 2021-09-02 22:47:44 INFO: [NcbiTaxonomy] Reading 'nodes' file: '/tmp/tmp_3wehd9p/NCBI/nodes.dmp' 2021-09-02 22:48:04 INFO: [NcbiTaxonomy] Reading 'names' file: '/tmp/tmp_3wehd9p/NCBI/names.dmp' 2021-09-02 22:48:08 INFO: [NcbiTaxonomy] Reading 'merged' file: '/tmp/tmp_3wehd9p/NCBI/merged.dmp' 2021-09-02 22:48:08 INFO: [NcbiTaxonomy] Done (24s) **2021-09-02 22:48:08 ERROR: [NcbiTaxonomy] Invalid taxid: '' 2021-09-02 22:48:08 ERROR: [MetagenomeSimulationPipeline] Invalid taxid in line 83** 2021-09-02 22:48:08 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

I also download new NCBI data set and change NCBI subfolder within the taxonomy, accordingly: taxdump.tar.gz +--NCBI
+-- nodes.dmp
+-- merged.dmp
+-- names.dmp
+-- delnodes.dmp

But still its not working......Any tips? Looking for your feedback and suggestions.....

genome_to_id_ssdna2.csv metadata_ssdna2.csv

AlphaSquad commented 3 years ago

I can reproduce this and the culprit is not the tax id in line 83 of your metadata file (the 83 corresponds to the line in the CAMISIM script), but to the ID 2293279. A quick check on NCBI reveals that this ID exists and belongs to "False black widow spider associated circular virus 1". Downloading the latest NCBI taxdump from today (and adding the NCBI subfolder) solved the problem for me though. Did you get the same error message when using your latest taxdump?

AlphaSquad commented 2 years ago

Did this resolve your problem?

AlphaSquad commented 2 years ago

Closing this due to inactivity