CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
177 stars 37 forks source link

metagenomesimulation.py raising "ERROR: [MetagenomeSimulationPipeline] 'gb|*********.1|:1-****' in line 117" Error #192

Closed cberta11 closed 4 months ago

cberta11 commented 4 months ago

Good afternoon, After successfully running metagenomesimulation.py on the provided sample dataset in defaults/mini_config.ini, I am having issues with running the script on my own genomes. Running metagenomesimulation.py with default/mini_config.ini containing paths to my own genomes returns: image while running metagenomesimulation.py with defaults/default_config.ini, with similar parameters, likewise returns image I am unsure what is causing the error, and as I said running the script on mini_config with the example data set you provided ran without issue. Any help would be greatly appreciated. Joe

I provided the following below for your convenience: 1.) metagenomesimulation.py defaults/mini_config.ini --debug print out 2.) defaults/mini_config.ini used 3.) metagenomesimulation.py defaults/default_config.ini --debug print out 4.) defaults/default_config.ini used 5.) mamba environment packages 6.) metadata.tsv used 7.) genome_to_id.tsv used

1.) metagenomesimulation.py defaults/mini_config.ini --debug print out image

2.) defaults/mini_config.ini used image

3.) metagenomesimulation.py defaults/default_config.ini --debug print out image

4.) defaults/default_config.ini used image

5.) packages list for the mamba environment used image

6.) metadata.tsv used image

7.) genome_to_id.tsv used image

cberta11 commented 4 months ago

I also thought to try setting anonymous reads to false since the issue seems to come from that portion of the metagenomesimulation.py script: image

However, this just threw the flag: image

So it seems to throw the error with whatever comes after the assembly stage?

AlphaSquad commented 4 months ago

Unfortunately, CAMISIM has some problems with special characters in sequence names - I assume the obscure error message you receive (gb|OL88421.1|:1-1451_2) is a sequence name in the Acidovorax caeni genome. I thought I had fixed that error in the latest version, but it still seems to pop up from time to time. It should work if you remove all - (and maybe also _ just to be sure) from the sequence names within the fasta file(s).

cberta11 commented 4 months ago

Thank you so much for your help! Removing the dashs and underscores has fixed the issue.