Closed novitch closed 4 years ago
If you have a suggestion I would like to test.. I am working on a linux server with 1To of ram and 120 cpu.. I tried between 10 and 120 cpu and obtain the same error message. thanks Alban.
Hi!
Thanks for reporting this. Could you share with me:
GCF_000001405.39_GRCh38.p13_genomic.fna
?The error message indicates that some data object is too large to be passed around by the multiprocessing library. I haven't tested InSilicoSeq
on large(ish) eukaryotes, so it is possible the human genome is too big for the default data type used by multiprocessing
I'll test on my side, but in the meantime you can try again with removing the human genome from your input dataset.
Best, Hadrien
Hi Hadrien, The file is 3,1G .. I'll try without and will let you inform.
The complete command is
iss generate --seed 110803 --abundance zero_inflated_lognormal --cpus 10 --genomes ../genomes_db/genomes.fna --seed 110803 --abundance zero_inflated_lognormal --model hiseq --output simulation_1million_1
Genomes.fna contains 114 genomes (human is the largest, only one in Go)
ok, sit seems to work if i do not use the human genome and do not use the full 120 threads.. 60 threads seems to work instead.
Hi a little update: I thought I could deal with the issue by making my random reads of my community and on another side generating reads of the human genome. => But with the human genome only, the problem still persists. I can't work without human reads, do you think it will be achievable?
I will not have time to fix this issue before mid-October unfortunately.
Ok, so i'll try to mix Art for human and your soft for the microbes.
Thanks, Alban.
Hi,
I started working on this. I could reproduce the bug when generating reads from a fasta file containing all human chromosome concatenated together as one record.
Any reason you are concatenating instead of using --draft
to generate accurate number of reads from each record in the reference genome?
EDIT:
I have a fix on the mem
branch. You can install from there with
pip install git+https://github.com/HadrienG/InSilicoSeq.git@mem
The fix is currently about 2 times slower than 1.4.x
in preliminary tests. It will need to be optimised before I can merge and release an official bugfix.
Hi, I was looking for generating reads with abundance values. So If I undestood correctly, I can't use both draft options and abundance file.
Thanks, for your speed, I'll try with the mem branch.
I can't use both draft options and abundance file.
Correct. This should be addressed within the month for release 1.5.0
(see #83 ).
I'll try with the mem branch
Thanks. Don't hesitate to report any bug you might find 😄
The fix is implemented in 1.4.4
Thanks Hadrien, Great job for your softs and quick releases :)
Hi I would like to use the soft in multi-threading but when i tried I ran into an issue: Here is the complete stdout: