"Population must be a sequence. For dicts or sets, use sorted(d). in line 83" [Python 3.11 compatibility?]

prototaxites commented 1 year ago

Trying to run CAMISIM to generate a very small test metagenome data set with a mixture of eukaryotic and prokaryotic genomes to test a pipeline with, following the example in the usage guide. I am getting the following error:

2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting
2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Validating Genomes
2023-02-27 10:28:52 INFO: [MetadataReader] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/genome_to_id.tsv'
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Design Communities
2023-02-27 10:28:53 INFO: [CommunityDesign] Drawing strains.
2023-02-27 10:28:53 INFO: [MetadataReader 31395689975] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/metadata.tsv'
2023-02-27 10:28:53 ERROR: [MetagenomeSimulationPipeline] Population must be a sequence.  For dicts or sets, use sorted(d). in line 83
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

Any idea what's going on and how to fix it?

metadata.tsv:

genome_ID   OTU NCBI_ID novelty_category
Pseudomicrostroma_glucosiphilum 1   1684307 known_strain
Aureobasidium_pullulans 2    5580   known_strain
Anaeromicropila_populeti    3    37658  known_strain
Bacillus_subtilis   4    1423   known_strain
Erwinia_billingiae  5    182337 known_strain
Frondihabitans_PhB188   6   2485200 known_strain
Pseudarthrobacter_scleromae 7   158897  known_strain
Pseudomonas_fluorescens 8   294 known_strain
Variovorax_boronicumulans   9   436515  known_strain

genome_to_id.tsv

Pseudomicrostroma_glucosiphilum genomes/GCA_003144135.1_Rhodsp1_genomic.fna
Aureobasidium_pullulans genomes/GCA_000721785.1_Aureobasidium_pullulans_var._pullulans_EXF-150_assembly_version_1.0_genomic.fna
Anaeromicropila_populeti    genomes/GCA_900112775.1_IMG-taxon_2599185221_annotated_assembly_genomic.fna
Bacillus_subtilis   genomes/GCA_000009045.1_ASM904v1_genomic.fna
Erwinia_billingiae  genomes/GCA_000196615.1_ASM19661v1_genomic.fna
Frondihabitans_PhB188   genomes/GCA_003752365.1_ASM375236v1_genomic.fna
Pseudarthrobacter_scleromae genomes/GCA_014644515.1_ASM1464451v1_genomic.fna
Pseudomonas_fluorescens genomes/GCA_900215245.1_IMG-taxon_2617270901_annotated_assembly_genomic.fna
Variovorax_boronicumulans   genomes/GCA_009811375.1_ASM981137v1_genomic.fna

AlphaSquad commented 1 year ago

Hey, thanks for bringing this to my attention. Are you by any chance using python>=3.11? Python 3.11 removed the automatic conversion of sets to lists as population of random samples and there is one instance of CAMISIM using the keys of a dict for random sampling. For compatibility with Python 3.11 there are two changes which need to be performed for CAMISIM to run:

In scripts/configparserwrapper.py line 5: from collections import Iterable needs to be changed to from collections.abc import Iterable (since CAMISIM does not run without that change I assume you already did this?)
In scripts/StrainSelector/strainselector.py line 253: for otu_id in random.sample(self._otu_list.keys(), len(self._otu_list)): to for otu_id in random.sample(list(self._otu_list.keys()), len(self._otu_list)): making the conversion explicit.

After this, CAMISIM runs on my end. I have not pushed these changes since I want to check that it keeps everything else intact and to ensure backward compatibility, but it should let you run CAMISIM. If you are not using Python 3.11 then I am sorry and will have to check things again, in the meantime I changed the title so other people using it can find the solution in this Issue.

prototaxites commented 1 year ago

Hey, thanks for the very quick reply! Yes, I was using Python 3.11 (though I'm currently spinning up a 3.9 conda environment). I did figure out the first change but not the second - I'll see how I get on with the 3.9 environment in the first instance, but if that fails I'll give the above a go.

prototaxites commented 1 year ago

Hi, Python 3.9 did the trick! For anyone else stumbling across this, the following conda environment works to run Camisim quite happily:

conda create -n camisim python=3.9 perl matplotlib-base numpy biopython biom-format scikit-learn configparser ete3 perl-xml-simple

AlphaSquad commented 1 year ago

Glad that it works, we tested CAMISIM mainly on Python 3.7. I hope that most of these environment and version problems will be solved once we move to CAMISIM2.0 (coming soon™)

CAMI-challenge / CAMISIM

"Population must be a sequence. For dicts or sets, use sorted(d). in line 83" [Python 3.11 compatibility?] #154