Closed ParkvilleData closed 5 years ago
Hi, thank you for your comment. The docker container is not extensively tested, so there still might be some bugs. In this case you are missing a biom file which you would have to put in the correct path, the "mini.biom" is just an example and not provided along CAMISIM. Additionally, you have to change the path you are mounting.
"/path/to/input/directory:/input:rw"
mounts the path before the : to the path after within the docker container and probably your input is not located in the path /path/to/input/directory locally. I would kindly advise to check the docker manual for details on how to run and mount docker containers.
When running CAMISIM directly outside of a docker container, you will also need a biom profile if you want to run ./metagenome_from_profile.py. If you don't have a profile, or you want to run CAMISIM without one (using ./metagenomesimulation.py), you will need a set of genomes and their taxonomic classification and need to set the files in your config file appropriately. The default configuration is not stored under
configuration/metagenome_simulation
but under
defaults/default_config.ini
which you need to change according to your needs. For details how to do this, please refer to the wiki
Thanks for the reply!
Yep, thought as much. That's what I initially did but the pipeline broke down much earlier than when I used your generic command. When I ran with my input locations the pipeline broke before it even downloaded the taxonomic files from ncbi..
I'll run it again and post what I did to get it to work.
Thanks!
Ok, I have doubled checked.
I am using the following command.
python metagenomesimulation.py default_config.ini
In the config the following genome classifications are being used.
metadata=documentation/CAMI2015_metadata_final.tsv id_to_genome_file=documentation/CAMI2015_paths.tsv ncbi_taxdump=tools/ncbi-taxonomy_20170222.tar.gz
A local version of samtools is set. All dependencies are installed.
The mode parameter wasn't set and I set it to replicates.
I get the following error
bshaban@6300d-111439-l:~/camisim$ python metagenomesimulation.py default_config.ini Traceback (most recent call last): File "metagenomesimulation.py", line 14, in <module> from scripts.argumenthandler import ArgumentHandler File "/home/unimelb.edu.au/bshaban/camisim/scripts/argumenthandler.py", line 9, in <module> import numpy.random as np_random File "/home/unimelb.edu.au/bshaban/.local/lib/python2.7/site-packages/numpy/__init__.py", line 142, in <module> from . import core File "/home/unimelb.edu.au/bshaban/.local/lib/python2.7/site-packages/numpy/core/__init__.py", line 59, in <module> from . import numeric File "/home/unimelb.edu.au/bshaban/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 3093, in <module> from . import fromnumeric File "/home/unimelb.edu.au/bshaban/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 17, in <module> from . import _methods File "/home/unimelb.edu.au/bshaban/.local/lib/python2.7/site-packages/numpy/core/_methods.py", line 158, in <module> _NDARRAY_ARRAY_FUNCTION = mu.ndarray.__array_function__ AttributeError: type object 'numpy.ndarray' has no attribute '__array_function__'
This is the same error I was receiving yesterday. I've double checked the config file and everything seems like it has everything it needs. I've checked the dependencies and they're installed.
I am using python 2.7.15 and the min is python 2.7.10? It seems to be a python error (i'm not well versed in python) should I use a newer version? Maybe 3?
bshaban@6300d-111439-l:~/camisim$ python Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
Thank you very much for your help, Bobbie.
The python version should not be a problem, for me it looks more like it could be your numpy version. The pipeline progresses further in the docker container, because all of the dependencies are automatically installed there. Could you please check your numpy version?
$ python Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org >>> import numpy >>> numpy.version.version '1.13.0'
As you can see, my python version here is 2.7.12 and it is running. If you have numpy < 1.13.0 make sure to install the dependencies, e.g. using
pip install -r requirements.txt
And in addition, you probably do not have the genomes from the CAMI1 challenge downloaded I assume? If you don't, then the paths in the CAMI_2015_paths.tsv file do not point to anything. After you have made sure that you have a correct version of numpy, could you post the output of
./metagenome_from_profile.py -p defaults/mini.biom -o out
here?
HI, thanks for the reply!
I had numpy 1.16 which resulted in the error that I posted originally. I removed numpy and used the
"pip install -r requirements" to install numpy 1.13.0
This ended up going further through the process.
The error I received this time was
bshaban@6300d-111439-l:~/camisim$ python ./metagenome_from_profile.py -p defaults/mini.biom -o out
NCBI database not present yet (first time used?)
Downloading taxdump.tar.gz from NCBI FTP site...
Done. Parsing...
Loading node names...
2045235 names loaded.
249975 synonyms loaded.
Loading nodes...
2045235 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /dir/.etetoolkit/taxa.sqlite ...
2045000 generating entries...
Uploading to /dir/.etetoolkit/taxa.sqlite
Inserting synonyms: 245000
Inserting taxid merges: 50000
Inserting taxids: 2045000
2019-01-21 13:37:15 WARNING: [root] Max strains per OTU not set, using default (3)
2019-01-21 13:37:15 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
2019-01-21 13:37:16 WARNING: [root] Some OTUs could not be mapped
ERROR: <type 'NoneType'>
Indeed that was the case. I added the locations of the genomes to the genome_path.csv and I received the same "ERROR: <type 'NoneType'>` as the metagenome_from_profile.py output.
I also tried the same command with a biom I created with qiime and that gave the following error.
bshaban@6300d-111439-l:~/camisim$ python ./metagenome_from_profile.py -p Rcombined.noempty.all.4.40.2013_08_greengenes_97_otus.with_euks_L6.biom -o out
2019-01-21 13:55:05 WARNING: [root] Max strains per OTU not set, using default (3)
2019-01-21 13:55:05 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
Traceback (most recent call last):
File "./metagenome_from_profile.py", line 87, in <module>
config = GG.generate_input(args) # total number of genomes and path to updated config
File "/home/unimelb.edu.au/bshaban/camisim/scripts/get_genomes.py", line 283, in generate_input
tax_profile = read_taxonomic_profile(args.profile, config, args.samples)
File "/home/unimelb.edu.au/bshaban/camisim/scripts/get_genomes.py", line 42, in read_taxonomic_profile
lineage = table.metadata(otu,axis="observation")["taxonomy"]
TypeError: 'NoneType' object has no attribute '__getitem__'
Exception AttributeError: "'NoneType' object has no attribute '_map_logfile_handler'" in <bound method LoggingWrapper.__del__ of <scripts.loggingwrapper.LoggingWrapper object at 0x7f670eb46090>> ignored
Thank you very much for your patience and help, it is much appreciated.
It seems that your own QIIME profile does not have a taxonomy attached to it; without that CAMISIM will unfortunately not be able to infer a metagenome profile.
The other error is...odd. Could you retry and post the output of the same command with the --debug flag, this should yield the exact code position where this error occurs.
Hi, thanks for that. Here is is.
bshaban@6300d-111439-l:~/camisim$ ./metagenome_from_profile.py -p defaults/mini.biom -o out --debug
2019-01-22 10:44:09 INFO: [root] Using commands:
2019-01-22 10:44:09 INFO: [root] -profile: defaults/mini.biom
2019-01-22 10:44:09 INFO: [root] -tmp: None
2019-01-22 10:44:09 INFO: [root] -ncbi: tools/ncbi-taxonomy_20170222.tar.gz
2019-01-22 10:44:09 INFO: [root] -reference_genomes: tools/assembly_summary_complete_genomes.txt
2019-01-22 10:44:09 INFO: [root] -o: out
2019-01-22 10:44:09 INFO: [root] -no_replace: True
2019-01-22 10:44:09 INFO: [root] -seed: None
2019-01-22 10:44:09 INFO: [root] -additional_references: None
2019-01-22 10:44:09 INFO: [root] -samples: None
2019-01-22 10:44:09 INFO: [root] -debug: True
2019-01-22 10:44:09 INFO: [root] -config: defaults/default_config.ini
2019-01-22 10:44:09 WARNING: [root] Max strains per OTU not set, using default (3)
2019-01-22 10:44:09 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
2019-01-22 10:44:09 WARNING: [root] Some OTUs could not be mapped
2019-01-22 10:44:09 WARNING: [root] Rank order of OTU Genome3 too high, no matching genomes found
2019-01-22 10:44:09 WARNING: [root] Full lineage was [91347, 1236, 1224, 2], mapped from BIOM lineage [u'k__Bacteria', u'p__Proteobacteria', u'c__Gammaproteobacteria', u'o__Enterobacterales']
2019-01-22 10:44:09 INFO: [root] Downloading 4 genomes
ERROR: <type 'NoneType'>
I seem to have now got the Docker example running. I will see If I can run metagenomiesimulation.py through docker as well and post an update.
Thanks!
The docker metagenome from profile command worked and I will run again soon. What I really need is to run against a set of genomes, i.e. de novo.
The manual says there are three files that are needed to run the community design de novo, but I don't see much difference between files one and two? One of the files contains the genome paths, the second contains the metadata, what does the third contain? Is it possible to put links to examples of each three in the documentation?
I run the metagenomesimulation.py with my default_config.ini which contains the appropriate paths. When I run the command metagenomesimulation.py default_config.ini
I get the same "ERROR: <type 'NoneType'>" error. I have tried to run this with the debug parameter but it doesn't give any further information.
Thanks for the help,
Peculiar. Could you try one of these two small things?
If the out folder is not present, it might explain why the docker is running, since that automatically mounts the out/-folder. I've encountered the "ERROR: <type 'NoneType'>" sometimes at the end of the pipeline and it didn't cause any damage. If you check the File Formats page it should explain the two required files, metadata and genome_to_id. I am not sure which third file you refer to, all other files but the two aformentioned ones are optional. You will need to have downloaded a set of genomes to run CAMISIM de novo though.
Hi,
Re: The three files. In the manual it says the following
The de novo community design needs three files to run:
A file containing, tab separated, a genome identifier and that path to the file of the genome.
A file containing, tab separated, a genome identifier and that path to the gen annotation of genome. This one is uses in case strains are simulated based on a genome
A [[meta data file|meta-data-file-format] that contains, tab separated and with header, genome identifier, novelty categorization, otu assignment and a taxonomic classification.
Is there a third file that needs to be linked to a gff annotation for the genome?
This is where I am confused about the three files. I had a fasta file with the genomes I wanted to run the de novo analysis on. I split the fasta into separate fasta genomes each containing one genome and updated the genome map, default_config.ini and metadata files accordingly so I'm not really sure what I am doing wrong.
With points 1 & 2. The folder was present, I renamed it and then created a new out folder and ran the command again. The output of the out folder is as follows.
-rw-rw-r-- 1 bshaban bshaban 42 Jan 23 10:58 abundance0.tsv
-rw-rw-r-- 1 bshaban bshaban 65 Jan 23 10:58 abundance1.tsv
-rw-rw-r-- 1 bshaban bshaban 888 Jan 23 10:58 config.ini
-rw-rw-r-- 1 bshaban bshaban 157 Jan 23 10:58 genome_to_id.tsv
-rw-rw-r-- 1 bshaban bshaban 135 Jan 23 10:58 metadata.tsv
genomes:
total 15M
drwxrwxr-x 2 bshaban bshaban 4.0K Jan 23 10:58 .
drwxrwxr-x 3 bshaban bshaban 4.0K Jan 23 10:58 ..
-rw-rw-r-- 1 bshaban bshaban 5.1M Jan 23 10:58 GCA_000210475.1_ASM21047v1.fa
-rw-rw-r-- 1 bshaban bshaban 4.4M Jan 23 10:58 GCA_000800765.1_ASM80076v1.fa
-rw-rw-r-- 1 bshaban bshaban 4.8M Jan 23 10:58 GCA_001051135.1_ASM105113v1.fa
Hi, you do not need the second file, which would be the gff annotation of the genome. These files just prevent our genome evolver to evolve within predicted genes of your provided genomes. Only if you want to simulate your own strains and these strains should not have evolved sequences within genomes, this file needs to be provided. For starters you will only need the first and third file you described above. Note that the out folder has to be empty (I should state that somewhere in the manual), but since you re-created it, that shouldn't be a problem. So CAMISIM downloaded the genomes and created all needed files but then crashed without comment? That didn't occur on any the machines I ran CAMISIM before. Could you please send me again
Hi,
CAMISIM seems to download the genomes, I haven't checked if they're complete but they look to be of reasonable file size. The output is below.
bshaban@6300d-111439-l:~/camisim$ python ./metagenome_from_profile.py -p defaults/mini.biom -o out --debug
2019-01-24 09:40:47 INFO: [root] Using commands:
2019-01-24 09:40:47 INFO: [root] -profile: defaults/mini.biom
2019-01-24 09:40:47 INFO: [root] -tmp: None
2019-01-24 09:40:47 INFO: [root] -ncbi: tools/ncbi-taxonomy_20170222.tar.gz
2019-01-24 09:40:47 INFO: [root] -reference_genomes: tools/assembly_summary_complete_genomes.txt
2019-01-24 09:40:47 INFO: [root] -o: out
2019-01-24 09:40:47 INFO: [root] -no_replace: True
2019-01-24 09:40:47 INFO: [root] -seed: None
2019-01-24 09:40:47 INFO: [root] -additional_references: None
2019-01-24 09:40:47 INFO: [root] -samples: None
2019-01-24 09:40:47 INFO: [root] -debug: True
2019-01-24 09:40:47 INFO: [root] -config: defaults/default_config.ini
2019-01-24 09:40:47 WARNING: [root] Max strains per OTU not set, using default (3)
2019-01-24 09:40:47 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
2019-01-24 09:40:48 WARNING: [root] Some OTUs could not be mapped
2019-01-24 09:40:48 WARNING: [root] Rank order of OTU Genome3 too high, no matching genomes found
2019-01-24 09:40:48 WARNING: [root] Full lineage was [91347, 1236, 1224, 2], mapped from BIOM lineage [u'k__Bacteria', u'p__Proteobacteria', u'c__Gammaproteobacteria', u'o__Enterobacterales']
2019-01-24 09:40:48 INFO: [root] Downloading 3 genomes
ERROR: <type 'NoneType'>
I am using Ubuntu
bshaban@6300d-111439-l:~/camisim$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
System specs
description: System memory
physical id: 0
size: 23GiB
*-cpu
product: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
size: 3721MHz
capacity: 3900MHz
Hm. Could you go into the default_config.ini and reduce the (sample) size to maybe 0.1 Gbp, see if that changes anything? Maybe you do not have enough RAM or hard drive space, even though CAMISIM should report this. Because it works fine on my Ubuntu system:
$ ./metagenome_from_profile.py -p defaults/mini.biom -o out --debug
2019-01-24 10:14:24 INFO: [root] Using commands:
2019-01-24 10:14:24 INFO: [root] -profile: defaults/mini.biom
2019-01-24 10:14:24 INFO: [root] -tmp: None
2019-01-24 10:14:24 INFO: [root] -ncbi: tools/ncbi-taxonomy_20170222.tar.gz
2019-01-24 10:14:24 INFO: [root] -reference_genomes: tools/assembly_summary_complete_genomes.txt
2019-01-24 10:14:24 INFO: [root] -o: out
2019-01-24 10:14:24 INFO: [root] -no_replace: True
2019-01-24 10:14:24 INFO: [root] -seed: None
2019-01-24 10:14:24 INFO: [root] -additional_references: None
2019-01-24 10:14:24 INFO: [root] -samples: None
2019-01-24 10:14:24 INFO: [root] -debug: True
2019-01-24 10:14:24 INFO: [root] -config: defaults/default_config.ini
2019-01-24 10:14:24 WARNING: [root] Max strains per OTU not set, using default (3)
2019-01-24 10:14:24 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
2019-01-24 10:14:24 WARNING: [root] Some OTUs could not be mapped
2019-01-24 10:14:24 WARNING: [root] Rank order of OTU Genome3 too high, no matching genomes found
2019-01-24 10:14:24 WARNING: [root] Full lineage was [91347, 1236, 1224, 2], mapped from BIOM lineage [u'k__Bacteria', u'p__Proteobacteria', u'c__Gammaproteobacteria', u'o__Enterobacterales']
2019-01-24 10:14:24 INFO: [root] Downloading 3 genomes
2019-01-24 10:14:33 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting
[...]
Were you able to run the docker container by now?
Hi,
Yes, the docker container worked and produced results after completing. Running with the sample size as 0.1 still gives the same error. I have a 24 Gb machine and have just got another 32Gb which I can try on. Why would the docker container work if I don't have enough RAM? I don't think it's that anyway, this is the output I get from using time -v
Command being timed: "python ./metagenome_from_profile.py -p defaults/mini.biom -o out --debug"
User time (seconds): 1.54
System time (seconds): 0.87
Percent of CPU this job got: 24%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.96
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 137268
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 100007
Voluntary context switches: 824
Involuntary context switches: 82
Swaps: 0
File system inputs: 0
File system outputs: 72600
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I've played around with the permissions and ran with sudo but that doesn't seem to do anything either. I'll keep looking into this today. Thanks for your help.
Yeah, you are right, it would not work in the docker either if it was a problem with RAM. At this point I am a little bit at a loss. I will try to build a docker with your system/software specs to see if I can reproduce this. The output from -v is also strange, since it reports exit status 0 which to my knowledge means that python terminated regularly/without exception.
HI, thanks for that, that's much appreciated! Would be possible to add a bash entry point to the docker container so I can enter and see an environment where everything is in the right place? That would be very helpful.
Thanks again for all your help.
HI, I have followed the update documentation but I am having trouble running CAMISAM.
Running the docker command says the mini.biom file is not available. I try entering the docker container using bash but that also gives me an error. I also try running "python metagenomesimulation.py configuration/metagenome_simulation" but it's unclear where that configuration is.. I thought it might be the config file which I have edited to suit but that doesn't work. would I be able to get some help getting this to run please?
`bshaban@6300d-111439-l:~/camisim$ sudo docker run -it -v "/path/to/input/directory:/input:rw" -v "/path/to/output/directory:/output:rw" cami/camisim:latest metagenome_from_profile.py -p /input/mini.biom -o /output NCBI database not present yet (first time used?) Downloading taxdump.tar.gz from NCBI FTP site... Done. Parsing... Loading node names... 2044492 names loaded. 249789 synonyms loaded. Loading nodes... 2044492 nodes loaded. Linking nodes... Tree is loaded. Updating database: /root/.etetoolkit/taxa.sqlite ... 2044000 generating entries... Uploading to /root/.etetoolkit/taxa.sqlite
Inserting synonyms: 245000 Inserting taxid merges: 50000 Inserting taxids: 2040000 2019-01-16 23:37:51 WARNING: [root] Max strains per OTU not set, using default (3) 2019-01-16 23:37:51 WARNING: [root] Mu and sigma have not been set, using defaults (1,2) Traceback (most recent call last): File "metagenome_from_profile.py", line 87, in
config = GG.generate_input(args) # total number of genomes and path to updated config
File "/usr/local/bin/scripts/get_genomes.py", line 283, in generate_input
tax_profile = read_taxonomic_profile(args.profile, config, args.samples)
File "/usr/local/bin/scripts/get_genomes.py", line 26, in read_taxonomic_profile
table = biom.load_table(biom_profile)
File "/usr/local/lib/python2.7/dist-packages/biom/parse.py", line 652, in load_table
with biom_open(f) as fp:
File "/usr/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/biom/util.py", line 443, in biom_open
if os.path.getsize(fp) == 0:
File "/usr/lib/python2.7/genericpath.py", line 57, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/input/mini.biom'`