CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
169 stars 37 forks source link

phase=2 error #73

Closed xiaole99 closed 4 years ago

xiaole99 commented 4 years ago

Hi,

I try to use phase=2, just to generate metagenomic data from a abundance profile. However, it didn't go through.

It seems to be aborted quite near the end. The error says:

....

2019-12-25 08:03:27 WARNING: [Validator 70674395627] List contains only one file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/2019.12.25_07.59.29_sample_0/bam/Genome90.bam'

2019-12-25 08:03:27 WARNING: [Validator 70674395627] List contains only one file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/2019.12.25_07.59.29_sample_0/bam/Genome92.bam'

2019-12-25 08:03:27 WARNING: [Validator 70674395627] List contains only one file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/2019.12.25_07.59.29_sample_0/bam/Genome93.bam'

2019-12-25 08:03:27 WARNING: [Validator 70674395627] List contains only one file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/2019.12.25_07.59.29_sample_0/bam/Genome94.bam'

2019-12-25 08:03:32 INFO: [MetagenomeSimulationPipeline] Anonymize Data

2019-12-25 08:03:32 INFO: [FastaAnonymizer] Interweave shuffle and anonymize

2019-12-25 08:05:31 INFO: [MetadataReader 15844424924] Reading file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/internal/genome_locations.tsv'

2019-12-25 08:05:32 INFO: [MetadataReader 9950427376] Reading file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/internal/meta_data.tsv'

2019-12-25 08:05:32 INFO: [MetadataReader 76616783066] Reading file: '/media/YXLmonthly3/tmp0fHyWs/tmp3yons0'

2019-12-25 08:05:39 INFO: [FastaAnonymizer 35635819767] Shuffle and anonymize '/media/YXLmonthly3/tmp0fHyWs/tmpgrRhqx' 2019-12-25 08:05:44 INFO: [MetadataReader 84224111778] Reading file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/internal/genome_locations.tsv'

2019-12-25 08:05:45 INFO: [MetadataReader 19158433521] Reading file: '/media/YXLmonthly3/test.taxa.tool/CAMISIM/DS/internal/meta_data.tsv'

2019-12-25 08:05:45 INFO: [MetadataReader 31396678674] Reading file: '/media/YXLmonthly3/tmp0fHyWs/read_start_positionsFuH0Cn'

2019-12-25 08:05:50 INFO: [MetadataReader 83729810015] Reading file: '/media/YXLmonthly3/tmp0fHyWs/tmpKlqAQq'

2019-12-25 08:05:50 ERROR: [MetagenomeSimulationPipeline] Column '0' not found! in line 117

2019-12-25 08:05:50 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

Which file does this referred about "line 117"? The pipeline didn't generate a log file showing all the commands it actually run.

My command is like:

python /home/yxl/CAMISIM/metagenomesimulation.py DS_mini_config.ini

my DS_mini_config.ini is like:

[Main] seed=632741178 phase=2 max_processors=15 dataset_id=DS output_directory=DS temp_directory=/media/YXLmonthly3 gsa=True pooled_gsa=True anonymous=True compress=1

[ReadSimulator] readsim=/home/yxl/CAMISIM/tools/art_illumina-2.3.6/art_illumina error_profiles=/home/yxl/CAMISIM/tools/art_illumina-2.3.6/profiles samtools=/home/yxl/CAMISIM/tools/samtools-1.3/samtools profile=mbarc size=0.1 type=art fragments_size_mean=270 fragment_size_standard_deviation=27

[CommunityDesign] distribution_file_paths=DS.distribution.txt ncbi_taxdump=/home/yxl/CAMISIM/tools/ncbi-taxonomy_20170222.tar.gz strain_simulation_template=/home/yxl/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/simulation_dir number_of_samples=1

[community0] metadata=DS.metadata.tsv id_to_genome_file=DS.genome_to_id.tsv id_to_gff_file= genomes_total=29 genomes_real=29 max_strains_per_otu=1 ratio=1 mode=differential log_mu=1 log_sigma=2 gauss_mu=1 gauss_sigma=1 view=False

DS.metadata.tsv

genome_ID OTU NCBI_ID novelty_category Genome1 1 262316 known_strain Genome3 2 435591 known_strain Genome6 3 335541 known_strain Genome10 4 525897 known_strain Genome12 5 526226 known_strain Genome17 6 660470 known_strain Genome20 7 565655 known_strain Genome25 8 577650 known_strain Genome29 9 330214 known_strain Genome30 10 760011 known_strain Genome32 11 580340 known_strain Genome43 12 1122623 known_strain Genome45 13 1123377 known_strain Genome49 14 1367847 known_strain Genome50 15 862971 known_strain Genome54 16 360412 known_strain Genome55 17 1193182 known_strain Genome56 18 1678841 known_strain Genome62 19 1300347 known_strain Genome67 20 96773 known_strain Genome68 21 1915078 known_strain Genome70 22 36805 known_strain Genome71 23 225992 known_strain Genome72 24 655015 known_strain Genome77 25 92485 known_strain Genome90 26 311182 known_strain Genome92 27 1121325 known_strain Genome93 28 642780 known_strain Genome94 29 157076 known_strain

DS.genome_to_id.tsv

Genome1 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000007865.1_ASM786v1_genomic.fna Genome3 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000012845.1_ASM1284v1_genomic.fna Genome6 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000014725.1_ASM1472v1_genomic.fna Genome10 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000023225.1_ASM2322v1_genomic.fna Genome12 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000024785.1_ASM2478v1_genomic.fna Genome17 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000147715.3_ASM14771v3_genomic.fna Genome20 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000157355.2_ASM15735v2_genomic.fna Genome25 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000186885.1_ASM18688v1_genomic.fna Genome29 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000196815.1_ASM19681v1_genomic.fna Genome30 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000208385.1_ASM20838v1_genomic.fna Genome32 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000233775.1_ASM23377v1_genomic.fna Genome43 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000422885.1_ASM42288v1_genomic.fna Genome45 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000423885.1_ASM42388v1_genomic.fna Genome49 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000444995.1_ASM44499v1_genomic.fna Genome50 /media/YXLmonthly3/test.taxa.tool/fna/GCA_000463505.1_ASM46350v1_genomic.fna Genome54 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001050235.2_ASM105023v2_genomic.fna Genome55 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001050535.1_ASM105053v1_genomic.fna Genome56 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001192835.1_ASM119283v1_genomic.fna Genome62 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001653335.1_ASM165333v1_genomic.fna Genome67 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001922305.1_ASM192230v1_genomic.fna Genome68 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001940525.2_ASM194052v2_genomic.fna Genome70 /media/YXLmonthly3/test.taxa.tool/fna/GCA_001974985.1_ASM197498v1_genomic.fna Genome71 /media/YXLmonthly3/test.taxa.tool/fna/GCA_002056725.1_ASM205672v1_genomic.fna Genome72 /media/YXLmonthly3/test.taxa.tool/fna/GCA_002117405.1_ASM211740v1_genomic.fna Genome77 /media/YXLmonthly3/test.taxa.tool/fna/GCA_002382135.1_ASM238213v1_genomic.fna Genome90 /media/YXLmonthly3/test.taxa.tool/fna/GCA_007833355.1_ASM783335v1_genomic.fna Genome92 /media/YXLmonthly3/test.taxa.tool/fna/GCA_900103615.1_IMG-taxon_2634166902_annotated_assembly_genomic.fna Genome93 /media/YXLmonthly3/test.taxa.tool/fna/GCA_900104965.1_IMG-taxon_2634166904_annotated_assembly_genomic.fna Genome94 /media/YXLmonthly3/test.taxa.tool/fna/GCA_900107505.1_IMG-taxon_2619618821_annotated_assembly_genomic.fna

DS.distribution.txt

Genome1 0.0441517957942387 Genome3 0.0109404262641697 Genome6 0.0221477311976966 Genome10 0.00827947197715554 Genome12 0.010452737647533 Genome17 0.538125982158083 Genome20 0.00799091578966445 Genome25 0.00728501141586711 Genome29 0.0107678540653408 Genome30 0.0382748154489283 Genome32 0.0413609410769992 Genome43 0.0223265097846553 Genome45 0.0131695664151367 Genome49 0.010558601263373 Genome50 0.013945773062763 Genome54 0.0164623775515922 Genome55 0.0202347864521667 Genome56 0.0219210571330763 Genome62 0.00660542474689718 Genome67 0.0130020065622389 Genome68 0.0128849258706586 Genome70 0.0066734080825383 Genome71 0.00603885169787873 Genome72 0.00864757046997278 Genome77 0.0286173192049772 Genome90 0.012784926564074 Genome92 0.0287333970247225 Genome93 0.00851804568076288 Genome94 0.0090977695968392

It is a bit verbose. Hope you can understand.

Thanks.

AlphaSquad commented 4 years ago

Hi,

thank you for your interest in CAMISIM and detailed description. The problem here unfortunately lies more in our documentation than anything else. Due to changes in the code, the "phase" option is not really functional anymore. However, that does not mean that you cannot do want you want to do, i.e. I was able to run your simulation using the files you provided (by generating some dummy genomes). The usage of phase=2 or phase=0 does not make a difference if you provide a distribution file - which you did. Unfortunately this means that I do not know more as why this error occurs right now. Could you try running CAMISIM with the -debug option and using a new/different output directory? Also, since the error occurs in the anonymization step, you might want to try running CAMISIM without anonymization to check whether it finishes then.

AlphaSquad commented 4 years ago

Did that work for you?

AlphaSquad commented 4 years ago

Closed due to inactivity