CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
167 stars 37 forks source link

Unable to run metagenomesimulation.py on docker #83

Closed Adoni5 closed 3 years ago

Adoni5 commented 4 years ago

Hi,

I pulled the latest docker image, but am struggling with the Documentation. What I am trying to do is create a Metagenomics sample of multiple genomes, with a mixed abundance. The genomes themselves aren't important so much as the varied abundance, and it being simulated data from real genomes, and that I receive FASTA/FASTQ at the end.

The docker image is working correctly but I'm not sure how to proceed.

Any help would be appreciated,

Rory

AlphaSquad commented 4 years ago

Dear Rory,

thanks for you interest in CAMISIM and I hope I will be able to help you. I am not entirely sure what you mean by "mixed" or "varied abundance" - just that not all genomes have the same abundance or do you have a specific distribution in mind? Do you have target genomes available? Then you will need to create the id_to_genome and metadata files (explained here: https://github.com/CAMI-challenge/CAMISIM/wiki/File-Formats) for these genomes, plug them into the config file (in addition to the read simulator you want to use and how much data you want to produce) and are ready to go using the metagenomesimulation.py script. The output of the read files will always be fastq-files. Please let me know which step is unclear and we will find a solution.

Adoni5 commented 4 years ago

Hi @AlphaSquad ,

Thanks for your quick response! So I have a specific distribution in mind, as I would need to know the abundances of each genome that are in the produced FASTQ. I do have target genomes available. I can see from the example file formats and the wiki what files I'm supposed to make, which is great.

My confusion is how do I pass the metadata files, genomes files and the config files into the docker command?

Adoni5 commented 4 years ago

@AlphaSquad Hey, sorry to chase you but it would be pretty handy if I could get this working soon!

AlphaSquad commented 4 years ago

Ah I see where the problem is now. After you have built the docker container you just run the command you want after the docker run command, i.e. if your container is called camisim, then

docker run "camisim" metagenome_from_profile.py -p mini.biom should run a small CAMISIM test.

Adoni5 commented 4 years ago
Inserting synonyms:      190000Traceback (most recent call last):
  File "metagenome_from_profile.py", line 11, in <module>
    import scripts.get_genomes as GG
  File "/usr/local/bin/scripts/get_genomes.py", line 15, in <module>
    ncbi = NCBITaxa()
  File "/usr/local/lib/python2.7/dist-packages/ete2/ncbi_taxonomy/ncbiquery.py", line 74, in __init__
    self.update_taxonomy_database()
  File "/usr/local/lib/python2.7/dist-packages/ete2/ncbi_taxonomy/ncbiquery.py", line 101, in update_taxonomy_database
2271000 generating entries... 
Uploading to /root/.etetoolkit/taxa.sqlite

    update_db(self.dbfile)
  File "/usr/local/lib/python2.7/dist-packages/ete2/ncbi_taxonomy/ncbiquery.py", line 659, in update_db
    upload_data(dbfile)
  File "/usr/local/lib/python2.7/dist-packages/ete2/ncbi_taxonomy/ncbiquery.py", line 698, in upload_data
    db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

Running the above test gives me this error - is the version of SQLlite pinned?

AlphaSquad commented 4 years ago

Uh oh, I just tried it with the docker and got the same error. Unfortunately I am neither an expert on docker nor on SQL. I will try to find out what went wrong and fix this asap. If you already have an idea what could be the problems, I am happy with any comment. Sorry for the inconvenience

AlphaSquad commented 4 years ago

I know this is quite old, but the error you report is one of the ete package in python and is not connected to docker or SQL. I've encountered this error while updating some scripts to be compatible with python3 and have added a fix described in the ete repository. This could possibly also solve your problem, please give it a try on the python3 branch!