bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
87 stars 17 forks source link

Out of memory killer whyle runnin poppunk_visualise #234

Closed acpaulo closed 1 year ago

acpaulo commented 1 year ago

Hello Prof John Lees, I was trying to run poppunk_visualise with the ful DB v6 (42,163 genomes) https://www.pneumogen.net/gps/training_command_line.html. The problem is that, the process was killed as it reached 230 GB virtual memory. But it was only using 26 GB.

Just to make sure I was using the command properly I later used the reference DB (933 genomes) and I was able to run it.

The command line I used was poppunk_visualise --ref-db GPS_v6 --output lixo.viz --microreact --threads 8

Is the RAM available really an issue? Or I'm doing something wrong Thank you

johnlees commented 1 year ago

Try adding the rapidnj option maybe? It would also help if you could share the output you are getting

acpaulo commented 1 year ago

Hi,

I'm sorry I took sometime to give you feedback. I tried to use rapidjn and the result was the same. I'm sending the output (in pdf) and the server capacity bellow. Thank you

bash-4.3$ free

     total      used      free       shared      buff/cache     available

Mem:  32844436   346160     31962020        16788         536256        32164536

Swap: 136372204   424400   135947804

Terminal_poppunk.pdf

johnlees commented 1 year ago

This looks like it's the mandrake step which is giving you problems. We were able to run much larger datasets than this with less memory, so I'm not sure exactly why this is happening. Can you:

acpaulo commented 1 year ago

Hi,

Thank you for your answer. Unexpectedly, in microreact output I only had the csv file. So i'm missing the tree and the embedding. I will try to run with mandrake but I just want to remember that if I try with the GPS4_references I can run microreact, without any problem.

I will give you feedback soon. Best Cristina

acpaulo commented 1 year ago

Hello,

I had installed so many times miniconda and poppunk that now I have an error when I run poppunk_visualise at the mandrake step.

I', sending the error and the command I use. I don't understand because it was well running previously, except for large datasets. I'll be gratefully by any help . By the way when I installed miniconda3 the system asked to update for miniconda3 v22.9 and I'm installing miniconda3 v4.12. (running on Mac OSX M1 with RAM 8)

The output file lixoGPSC4.viz has only a csv file in there

(pp_env) cristinapaulo@acpaulo-novo-adapt Experiment % poppunk_visualise --ref-db GPS_v4_references --output lixoGPSC4.viz --microreact --threads 8

Graph-tools OpenMP parallelisation enabled: with 8 threads PopPUNK: visualise Loading previously refined model Completed model loading Building phylogeny Writing microreact output Parsed data, now writing to CSV Running mandrake Running mandrake Traceback (most recent call last): File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/bin/poppunk_visualise", line 11, in sys.exit(main()) File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/visualise.py", line 624, in main generate_visualisations(args.query_db, File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/visualise.py", line 548, in generate_visualisations microreact_files = outputsForMicroreact(combined_seq, File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/plot.py", line 742, in outputsForMicroreact embedding_file = generate_embedding(seqLabels, accMat, perplexity, outPrefix, overwrite, File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/mandrake.py", line 66, in generate_embedding I, J, dists = pp_sketchlib.sparsifyDists(distMat=accMat, distCutoff=0, kNN=kNN) AttributeError: module 'pp_sketchlib' has no attribute 'sparsifyDists'

johnlees commented 1 year ago

Can you please provide the versions of PopPUNK, pp_sketchlib and mandrake

johnlees commented 1 year ago

I think this is likely due to an API breaking change in sketchlib, can you try with version 2.0.0 rather than 2.0.1. This will be fixed in poppunk 2.6.0, which is due imminently

acpaulo commented 1 year ago

Versions: (pp_env) cristinapaulo@acpaulo-novo-adapt Experiment % poppunk --version
poppunk 2.5.0 (pp_env) cristinapaulo@acpaulo-novo-adapt Experiment % mandrake --version
mandrake 1.2.2 (pp_env) cristinapaulo@acpaulo-novo-adapt Experiment % sketchlib --version pp-sketchlib v2.0.1

I downgrade pp_sketchlib to 2.0.0 and the output was the following (But I do have the nwk, dot an dcsv files that work on microreact online):

(pp_env) cristinapaulo@acpaulo-novo-adapt Experiment % poppunk_visualise --ref-db GPS_v4_references --output lixoGPSC4.viz --microreact --threads 8

Graph-tools OpenMP parallelisation enabled: with 8 threads PopPUNK: visualise Loading previously refined model Completed model loading Building phylogeny Writing microreact output Parsed data, now writing to CSV Running mandrake Running on CPU Preprocessing 2562 samples with perplexity = 20 took 180ms Optimizing Progress: 99,9%, eta=0,0010, Eq=0,5601824770, clashes=1,2% Optimizing done in 2s Traceback (most recent call last): File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/bin/poppunk_visualise", line 11, in sys.exit(main()) File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/visualise.py", line 624, in main generate_visualisations(args.query_db, File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/visualise.py", line 561, in generate_visualisations url = createMicroreact(output, microreact_files, api_key) File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/plot.py", line 779, in createMicroreact with pkg_resources.resource_stream(name, 'data/microreact_example.pkl') as example_pickle: File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/pkg_resources/init.py", line 1160, in resource_stream return get_provider(package_or_requirement).get_resource_stream( File "/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/pkg_resources/init.py", line 1632, in get_resource_stream return open(self._fn(self.module_path, resource_name), 'rb') FileNotFoundError: [Errno 2] No such file or directory: '/Users/cristinapaulo/opt/miniconda3/envs/pp_env/lib/python3.10/site-packages/PopPUNK/data/microreact_example.pkl'

johnlees commented 1 year ago

I'm not sure why the microreact pickle file isn't available, it's likely an install problem that would be fixed in a clean environment. But as you note this is just for the final step, and you can get a microreact output without this

acpaulo commented 1 year ago

HI john,

Thank you very much.

johnlees commented 1 year ago

I'm going to close this issue, but please feel free to reopen if you want to look at the memory issue again with what I suggested above