Closed BenjaminSchwessinger closed 6 years ago
First, please look at the Binaries. The docs you're looking at are probably out-of-date.
Second, we will release a new binary today, to drop the --tmpDir=/scratch
for pbalign.
Third, we'll soon look into ways to set the number of procs/threads for pbalign, and the memory for samtools. Not sure when that change will be ready.
The issue is that the samtool sort uses the -m flag (-m 4G) which allocates memory per thread used and not overall.
Fourth, thank you so much for reporting this! It hasn't harmed us because we have so much memory on our machines, but we'll look for a bug-fix soon.
If we come up with a short-term fix, I'll let you know.
Thanks Chris.
I got the latest falcon from here https://pb-falcon.readthedocs.io/en/latest/quick_start.html and wasn't sure where to raise the issue. Thanks for pointing me in the right direction.
I now installed everything successfully. I would just recommend to advise folks to run the install script when qlogged. I just installed regularly on the head node and run into troubles with memory and core usage issues. My admins weren't that happy. I fixed everything since then thanks.
I would just recommend to advise folks to run the install script when qlogged. I just installed regularly on the head node and run into troubles with memory and core usage issues. My admins weren't that happy.
Hmmm. Could you give us more info? The system is designed to be pretty efficient on the driver node. We fork Python for each qsub call (depending on which pwatcher you use), but that's only njobs*PythonSize
. Runtime should be quite low because we have a geometric back-off for sleep-times between checks. Even with fs_based
, we check only 1 file (run.sh.done
) per running qsub job per sleep-cycle.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9539 cdunn 20 0 2055100 36796 8424 S 0.3 0.0 0:05.32 python
That's typical. Or
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10768 cdunn 20 0 433660 28080 5852 S 0.3 0.0 0:00.66 python
10699 cdunn 20 0 160356 2900 1588 R 0.0 0.0 0:00.42 top
10808 cdunn 20 0 62712 7300 1940 S 0.0 0.0 0:00.01 qsub
10810 cdunn 20 0 62712 7296 1940 S 0.0 0.0 0:00.01 qsub
It spikes briefly when submitted a job -- around 10% CPU for each job -- less than a second.
Were you using job_type=local
?
The Falcon dummy example works fine. The issue is with the unzip part. Right now the quiver step is hard coded for memory and thread requirements. This cased issue on my head node 24 cores and quite a bit of memory. Once this is flexible and the corresponding cfg file fixed I don't see an issue.
Yes, quiver is currently hard-coded for threads/memory. However, it's supposed to run on a distributed node. You should not run the tasks locally if you are sharing a node with others.
Maybe we mean something different by "head node"? Yes, we need to respect the limits of the remote node that the job is distributed to.
I think the confusion comes from me not being specific that I installed falcon with the script provided here https://pb-falcon.readthedocs.io/en/latest/quick_start.html. The 'install_unzip.sh' runs a toy sample at the end without distribution. This works fine for the faclon part but the unzip part is really asking for too much memory without distribution. Hope this makes more sense now.
Yes. We need to update those docs with memory requirements.
Hi all, Thank you very much for the latest release install_unzip_180725.sh.
I now installed it and got stuck at the toy sample in the quiver step everything else looks fine. My issue is the following.
The following command crashes.
The issue is that the samtool sort uses the -m flag (-m 4G) which allocates memory per thread used and not overall. This crashes my set-up right now. If I change the fc_unzip.cfg file as follows pbalign still uses 24 processes.
Is there a way to change the setting for samtools sort or pbalign to use less threats/memory? Samtools works just fine when using less threats or without -m 4G flag.
Otherwise all looks good and I will go ahead with the falcon part of my assembly.