Closed SwiftSeal closed 1 year ago
A closer look at the processes, turns out there are a lot more running but have all died. From htop:
PID USER PRI NI VIRT RES SHR S CPU%▽MEM% TIME+ Command
2820 username 25 5 7739M 1026M 4292 R 99.5 0.5 83h09:48 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 /mnt/shared/scratch/username/apps/conda/env
3746 username 25 5 13.0G 3749M 133M S 0.6 2.0 28h25:59 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
2991 username 25 5 27668 8136 1592 S 0.0 0.0 0:00.13 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.resource_tracker
3722 username 25 5 8079M 1398M 10992 S 0.0 0.7 34h27:11 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3723 username 25 5 7567M 890M 10920 S 0.0 0.5 34h38:54 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3724 username 25 5 7578M 898M 11032 S 0.6 0.5 34h40:16 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3725 username 25 5 7440M 761M 10944 S 0.0 0.4 34h30:01 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3726 username 25 5 7329M 661M 10984 S 0.0 0.3 34h32:01 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3728 username 25 5 7735M 1059M 11052 S 0.0 0.6 34h30:41 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3729 username 25 5 8121M 1445M 11188 S 0.0 0.8 34h30:50 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3730 username 25 5 7631M 953M 11108 S 0.0 0.5 34h23:07 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3731 username 25 5 7664M 992M 11108 S 0.0 0.5 34h28:03 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3732 username 25 5 7541M 866M 11000 S 0.0 0.5 34h16:46 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3733 username 25 5 7601M 930M 11136 S 0.0 0.5 34h32:43 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3734 username 25 5 7385M 712M 11148 S 0.0 0.4 34h19:38 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3735 username 25 5 8585M 1892M 10932 S 0.0 1.0 34h36:07 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3736 username 25 5 7693M 1024M 11068 S 0.0 0.5 34h29:37 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3737 username 25 5 8487M 1815M 11100 S 0.0 0.9 34h48:06 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3738 username 25 5 7971M 1295M 11152 S 0.0 0.7 34h24:11 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3739 username 25 5 7377M 697M 11048 S 0.0 0.4 34h30:45 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3740 username 25 5 8202M 1524M 11116 S 0.0 0.8 34h30:49 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3741 username 25 5 7377M 706M 11144 S 0.0 0.4 34h17:47 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3742 username 25 5 0 0 0 Z 0.0 0.0 21h59:12 python3.9
3743 username 25 5 7696M 1014M 11004 S 0.0 0.5 34h31:07 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3744 username 25 5 8308M 1641M 11164 S 0.0 0.9 34h27:05 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3745 username 25 5 8057M 1378M 11148 S 0.0 0.7 34h18:14 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3746 username 25 5 13.0G 3749M 133M S 0.0 2.0 28h25:59 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3747 username 25 5 0 0 0 Z 0.0 0.0 27h33:19 python3.9
3748 username 25 5 0 0 0 Z 0.0 0.0 27h37:08 python3.9
3749 username 25 5 13.2G 3856M 133M S 0.0 2.0 28h14:12 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
3750 username 25 5 0 0 0 Z 0.0 0.0 16h00:46 python3.9
3751 username 25 5 0 0 0 Z 0.0 0.0 6h36:29 python3.9
3753 username 25 5 6771M 176M 2776 S 0.0 0.1 16:43.84 /mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spaw
The full cmd of the processes that keep relaunching:
/mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=14, pipe_handle=68) --multiprocessing-fork
/mnt/shared/scratch/username/apps/conda/envs/deepsignalpenv/bin/python3.9 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=14, pipe_handle=74) --multiprocessing-fork
I have a feeling this might be due to memory caps on our SLURM system. Just looking at issue https://github.com/PengNi/deepsignal-plant/issues/23, I was running 32 nprocs on only 60G mem, so likely to have exceeded that?
I've capped it to 16 procs and relaunched it to see if that freezes at the same point, I'll close this if it doesn't freeze at the same point :)
@SwiftSeal , thank you very much for using deepsignal-plant! You can also try to set a smaller --f5_batch_size
and a smaller --batch_size
to reduce the memory of each process.
Best, Peng
It was a mem issue, ran fine once I gave it enough room :) I'll close this now, great piece of software!
Hello,
I'm running deepsignal plant on a 750m plant genome with approx 40x coverage ONT reads. I ran:
a few days ago. It began successfully - so far the
fast5s.C.call_mods.tsv
file is 138G. It has now stopped and is no longer writing to the mods file.There is a single process still running, which is consuming 100% CPU according to htop. It also periodically launches a process but this dies before I can see what it is.
This is the current output of deepsignal:
I ran the example data successfully so not too sure why this has happened! The only other possible reason I can see is that tombo resquiggle experienced an error while running:
Could the tombo errors affect the deepsignal run? Happy to share any other information needed.
Thanks in advance!