Closed azk001 closed 5 months ago
Can you try reinstalling speedict from within the conda environment?
pip install speedict --force-reinstall
And run test-SuperPang.py
to see if the error persists
Thank you! It worked for the test-SuperPang.py. I am gonna try for my genomes.
I am sorry. I had to reopen this issue, because I am still getting this error on newer version:
Traceback (most recent call last):
File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/scripts/homogenize.py", line 360, in
This is SuperPang version 1.1.0.post3
If using in publications or products please cite:
Puente-Sánchez F, Hoetzinger M, Buck M and Bertilsson S. Exploring intra-species diversity through non-redundant pangenome assemblies. Molecular Ecology Resources (2023) DOI: 10.1111/1755-0998.13 826
There was an error running homogenize.py. Please open an issue
How large are your input files? And how much available RAM do you have?
The size of my input files are around 12GB. and I am using 100Gb to run the script.
How large is each genome? I would expect ~500 prokaryotic genomes to be smaller. Still I would expect 12Gb of genomes to fit in 100Gb of RAM (at least at this stage of the pipeline...)
Can you monitor RAM consumption by homogenize.py
to see if that is the issue?
I am sorry It was a typing error, it is actually in total around 2GB, and each genome is around 5Mb.
I could not find how to monitor the RAM usage for the script.
This is weird then. If you are ok with sharing your genomes with me (mega/Dropbox/WeTransfer...) I can try to run It and see if I can reproduce the error
Here is the NCBI link for all the genomes that I used: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=442694
Ok, yes, I think there's a memory leak in one of the C extensions that has become apparent with your dataset. I'm testing a fix now, will let you know how it goes.
I just published a pre-release version with this and other fixes in pypi.
To get it you can go to your SuperPang conda environment and then run python -m pip install superpang==1.3.0a0
.
I will do some more testing and add some extra things before publishing an official release, but this should be enough to get you past the bug.
What a timing! I just finally got to this step with my new dataset (440 samples), and the fix was released 17 hours ago =).
I've done some more testing and results seemed to be ok so I've published a new version with these fixes. It also runs faster (not dramatically though)
I am working with around 450 genomes. I am also getting the same error when I am using the 0.9.4.beta1 version.
But latest version is giving me this error:
ImportError: /usr/lib64/libc.so.6: version `GLIBC_2.25' not found (required by /mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/speedict/speedict.cpython-38-x86_64-linux-gnu.so)
Can you please help me how to solve this issue?
Thank you!
Originally posted by @azk001 in https://github.com/fpusan/SuperPang/issues/10#issuecomment-1979600699