There was an error running homogenize.py

azk001 commented 8 months ago

Hi!

I am working with around 450 genomes. I am also getting the same error when I am using the 0.9.4.beta1 version.

But latest version is giving me this error:

ImportError: /usr/lib64/libc.so.6: version `GLIBC_2.25' not found (required by /mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/speedict/speedict.cpython-38-x86_64-linux-gnu.so)

Can you please help me how to solve this issue?

Thank you!

Originally posted by @azk001 in https://github.com/fpusan/SuperPang/issues/10#issuecomment-1979600699

fpusan commented 8 months ago

Can you try reinstalling speedict from within the conda environment? pip install speedict --force-reinstall And run test-SuperPang.py to see if the error persists

azk001 commented 8 months ago

Thank you! It worked for the test-SuperPang.py. I am gonna try for my genomes.

azk001 commented 8 months ago

I am sorry. I had to reopen this issue, because I am still getting this error on newer version:

Traceback (most recent call last): File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/scripts/homogenize.py", line 360, in cli() File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/scripts/homogenize.py", line 356, in cli main(parse_args()) File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/scripts/homogenize.py", line 27, in main fasta2fastq(args.fasta, current1) File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/lib/utils.py", line 77, in fasta2fastq write_fastq(read_fasta(fasta), fastq) File "/mmfs1/home/azk0151/miniconda3/envs/SuperPang/lib/python3.8/site-packages/superpang/lib/utils.py", line 9, in read_fasta for seq in open(fasta).read().strip().lstrip('>').split('>'): OSError: [Errno 14] Bad address

This is SuperPang version 1.1.0.post3

If using in publications or products please cite:

Puente-Sánchez F, Hoetzinger M, Buck M and Bertilsson S. Exploring intra-species diversity through non-redundant pangenome assemblies. Molecular Ecology Resources (2023) DOI: 10.1111/1755-0998.13 826

There was an error running homogenize.py. Please open an issue

fpusan commented 8 months ago

How large are your input files? And how much available RAM do you have?

azk001 commented 8 months ago

The size of my input files are around 12GB. and I am using 100Gb to run the script.

fpusan commented 8 months ago

How large is each genome? I would expect ~500 prokaryotic genomes to be smaller. Still I would expect 12Gb of genomes to fit in 100Gb of RAM (at least at this stage of the pipeline...) Can you monitor RAM consumption by homogenize.py to see if that is the issue?

azk001 commented 8 months ago

I am sorry It was a typing error, it is actually in total around 2GB, and each genome is around 5Mb.

I could not find how to monitor the RAM usage for the script.

fpusan commented 8 months ago

This is weird then. If you are ok with sharing your genomes with me (mega/Dropbox/WeTransfer...) I can try to run It and see if I can reproduce the error

azk001 commented 8 months ago

Here is the NCBI link for all the genomes that I used: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=442694

fpusan commented 8 months ago

Ok, yes, I think there's a memory leak in one of the C extensions that has become apparent with your dataset. I'm testing a fix now, will let you know how it goes.

fpusan commented 8 months ago

I just published a pre-release version with this and other fixes in pypi. To get it you can go to your SuperPang conda environment and then run python -m pip install superpang==1.3.0a0. I will do some more testing and add some extra things before publishing an official release, but this should be enough to get you past the bug.

timyerg commented 8 months ago

What a timing! I just finally got to this step with my new dataset (440 samples), and the fix was released 17 hours ago =).

fpusan commented 8 months ago

I've done some more testing and results seemed to be ok so I've published a new version with these fixes. It also runs faster (not dramatically though)

fpusan / SuperPang

There was an error running homogenize.py #12