frazer-lab / cluster

Repo for cluster issues.
1 stars 0 forks source link

java memory error #249

Closed billgreenwald closed 6 years ago

billgreenwald commented 6 years ago

Hey Paul,

I am trying to run some nextflow pipelines via SGE. Nextflow is a pipeline management software similar to snakemake or Luigi that can launch jobs onto the cluster, manage file dependencies,e tc.

It requires very little memory, but when launch it without specifying the -Xmx or -Xms options in java, it was requesting 32Gb of vRAM for a process using ~300MB of actual RAM.

To fix this, I launch it with -Xmx 500m and -Xms 50m. This let it run the other day on the submitted nodes in my requested 4GB of ram for the job, but the odd thing was that it went up to 3.5Gb total vRAM even though I am telling it to only use 500Mb.

Today, the cluster is more full, and I am now getting new files spawning that are hs_err_pid###.log, which tell me that there is not enough memory on the system for java to run a Malloc for 52kb. The actual node is almost completely full of jobs, but i still got my 4Gb of virtual ram requested when the job launches, so I am a little confused.

Any thoughts

tatarsky commented 6 years ago

A sample job id would be cool so I can look at what you are asking for and where its assigned.

billgreenwald commented 6 years ago

the pids were 25701, 28309, and 32564 for some examples. Logs at /frazer01/home/bill/CARDIPS-caQTL-manuscript/Scripts/ASE_pipeline/hs_err_pid####.log

The job ids were 5709269.X and 5709268.X

tatarsky commented 6 years ago

OK. Let me take a look. The memory ways of Java are always exciting and unique.

tatarsky commented 6 years ago

I am trying to glue the pid error logs to the actual jobid/task as they don't seem to contain the node.

Do you also have SGE output/error files for these runs somewhere?

billgreenwald commented 6 years ago

/frazer01/home/bill/CARDIPS-caQTL-manuscript/Scripts/ASE_pipeline/Logs

tatarsky commented 6 years ago

This may be a case where the h_vmem over-subscription (which we do to allow more job throughput because people ask for more than they actually use in jobs) isn't working because the jobs out there are actually using what they asked for. Still looking though.

tatarsky commented 6 years ago

One probably unrelated thing I note we might want to rev our Java 8 module. The one you are using it two years old. Its version -111 and the current version is -181.

But again, unrelated is likely as you were fine before on not as loaded cluster. Just mentioning.

tatarsky commented 6 years ago

Do you mind while I go to lunch to see if I can spot over-subscription being a factor issuing the same jobs with a requested h_vmem=25G. Thats roughly the "excess" we add to the real memory. Really any number large enough to see if Java is running into allocation battles with the matteo jobs.

I'll be back in a bit.

billgreenwald commented 6 years ago

Yea i'll give it a try

billgreenwald commented 6 years ago

Yea they work now.

Would you mind explaining how you calculated the 25G that was needed, or if you picked it randomly, as well as how our h_vmem calculation works so I can figure this out in the future?

tatarsky commented 6 years ago

Somewhat random but 25GB is the level of over-subscription we use assuming people don't really request what they use in h_vmem. We pretend the nodes can take up to 150GB of h_vmem requests when the actual is 125GB of available physical ram.

Its a bit of a voodoo suggestion but basically I was curious if what you are seeing is because your jobs are being scheduled on systems already over the real memory (and starting to swap) and that sometimes makes Java mallocs fail IIRC. I show memory was rather allocated this morning but not quite as high as I wanted to see. But its also very "point in time" and Ganglia smooths things over 5 minute intervals.

I've had to reduce "over-subscription" a few places that have more Java based pipelines.

It might also be interesting to do less dramatic values of 8G/12G.

I'm still looking at gather some metrics that might explain it better. Cgroups would probably also handle this better than rlimits which is how h_vmem is handled right now. But cgroups and open source SGE are a bit of a kludge.

billgreenwald commented 6 years ago

cool thanks. close when you want

tatarsky commented 6 years ago

I never got back to this but I'm adding to a SGE review of cgroups status or the considering of Unvia's version which handles this better I believe.