Closed aaschwanden closed 6 years ago
Hi @aaschwanden. A few quick questions before digging into this further.
Thanks!
Hi,
I’m also cc’ing our HPC experts as the idea has come up that a glitch in the configuration of the nodes would allow too many post processing jobs to be run on a single node. We currently don’t know if this is related or a red herring, though.
On Jan 2, 2018, at 6:56 AM, Ben Koziol notifications@github.com wrote:
Hi @aaschwanden. A few quick questions before digging into this further.
• Does this happen immediately when ocgis is first imported? Or does this happen after some other processes are run?
It happens immediately when ocgis is first imported
• Is this job being run in parallel or on a single process?
Serial. I currently don’t know how run ocgis in parallel for my problem. I will contact you with a separate issue to explore ways how to use ocgis most efficiently for my needs to extract data from large (>0.5TB) files.
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks @aaschwanden. Let me know what you find out! Reading through the SO post you linked too, it sounds like popen/fork in Python can use a non-negligible amount of memory. This could help explain the out-of-memory error if the node usage is too high.
Serial. I currently don’t know how run ocgis in parallel for my problem. I will contact you with a separate issue to explore ways how to use ocgis most efficiently for my needs to extract data from large (>0.5TB) files.
Sounds good!
Hi @aaschwanden. I wanted to check in and see if you've made any progress on this issue. How's it going?
I’ve been traveling for the past four weeks and haven’t had a chance to look into it. Back in the office next week to revisit the issue.
On Jan 31, 2018, at 10:07, Ben Koziol notifications@github.com wrote:
Hi @aaschwanden. I wanted to check in and see if you've made any progress on this issue. How's it going?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Ahhh, I hope you are having a good trip. No rush of course.
@aaschwanden I'm going to close this for now. Let me know if there is still an issue on your end.
Hi,
I'm trying to run OCGIS on an HPC cluster that uses SLURM and has dedicated post-procssing nodes (http://www.gi.alaska.edu/research-computing-systems/hpc).
This sometimes works, but more often it doesn't. Running my script on the login node, I get the following warning:
but running the same on the post-processing node, it bails most of the time with:
At first I considered this a glitch in our HPC system that we are not able to track down, but now I've come across some posts that relate this error to the use of "fork" in python:
https://stackoverflow.com/questions/20111242/how-to-avoid-errno-12-cannot-allocate-memory-errors-caused-by-using-subprocess
so I've decided to post it here because it happens during the initialization of OCGIS.
I understand that the information provided here may be insufficient to diagnose the problem, and I'd be happy to share additional debugging info.
Thanks.