Closed lzamparo closed 8 years ago
I see #93 set the limit at 10Gb. Does nobody else have to run jobs on larger data sets than 10Gb?
The GPU queue has had a max memory cap of 10GB as you note. Raised as you note from 4GB back aways.
set queue gpu resources_max.mem = 10gb
I don't really know why the queue was limited in this manner. I have no objection to raising it and to facilitate your research while perhaps some others try to recall why this was the case I made it 100GB.
Please retry as quickly as you can as its late here.
I think you can still request a GPU in the batch queue, which is what I had done to get a lot of RAM+ a GPU.
I believe we added a submit filter (per some other Git request) to block that. But feel free to try ;)
Basically advise if anyone knows the reason and we can review in the morning. But @lzamparo given your deadline please confirm you can at least get something running.
Also just so you know #226 was a GPFS token memory exhaustion problem. Not related here. I am going offline. If you do not confirm you can run now I will assist in the morning.
Oh, and reviewing the submit filter it was to prevent running in the gpu queue without asking for a gpu. So the method @pgrinaway mentions would have worked as well. Offline now. You have two options to proceed. Will reduce max memory limit if asked for gpu queue in morning.
@pgrinaway : so you just submit to -q batch
but also requested a gpu with #PBS -l nodes=1:ppn=1:gpus=1:docker:gtxtitan
(or something similar)? Trying that now...
That should work!
Please note my comments that the gpu queue max memory remains 100GB as well until folks ask me to drop it back down to 10GB.
Thanks @tatarsky . Cancelled the batch job, enqueued a gpu job.
I believe this to be resolved within reason. If somebody feels 100GB is too high a max for the GPU queue feel free to re-open.
Sorry for joining in so late. Only one question. I think motivation for the memory limit was to prevent people from taking the gpu queue to circumvent a full / stuffed batch queue. However, I don't know how the current systems integrates the priorities of these two queues.
Well, gpu
inherits batch
nodes. So if the batch queue is tapped out in slots and ram, so also is gpu.
In other words if you are waiting on resources in batch
you would in gpu
as well I believe.
I believe gpu
has a slight preference in priority but also requires you ask for a gpu
resource.
So again, I will lower the limit if desired but my goal was to allow @lzamparo to meet his deadline.
Having the current limit is fine with me. I just wanted to provide context with what I remember to be one of the reasons to put the limit into place initially.
Which is great and I appreciate it @akahles and if I don't hear anything to the contrary we will reference your sage memory of the situation when we discover abuse of the gpu queue ;)
Thanks again for the quick response @tatarsky , I should hopefully have some results by this weekend.
You are very welcome. Have a great weekend.
Is there a limit to the amount of memory that you can request when submitting to the GPU queue? I did a cursory search of the user guide and found nothing advertised.
I've got a large data set (72Gb) that I'd like to hold in memory and then process in smaller batches on a GPU. Here's the relevant snippet of my submission script:
So, is there specific flag I should use in my submission script that allows for large memory gpu queue jobs? Or is there a memory limit that cannot be circumvented? Or is this a reoccurrence of #226 ?
I really appreciate any help for this, it's for preliminary results for a grant due this weekend.