Closed billgreenwald closed 5 years ago
Ah, good to know on vacation as I was trying to arrange a drive replacement. So I'll postpone that.
When you measure "ram use" are you looking at "cached" because that is basically the Linux cache memory designed to speed I/O and its "used" but can be pushed out by regular use of memory.
You can flush those bytes to disk but they will come back and its a performance impact. Are you being denied memory use?
Also, I'm happy to issue the "drop cache" command to confirm thats what you are looking at but be aware the nature of linux is to always use free memory for something and that caching is vital to performance. (And will re-fill). But your requests should NOT be getting denied by cache memory usage so just confirm thats not happening.
David had issues running something that needed a lot of RAM, and it was only able to use 50% of the RAM on the system before starting to use swap space. I think this is what you mean by "requests getting denied"
As far as the type of RAM, I am going off of htop and the coloring, which is not an exact science, since I can't seem to get the numbers to fall out of the actual Mem% columns from top/htop
Most of the ram used is the green color which is used memory; there is a similar amount of cached but that should be ok to leave (see first comment in this reply tho)
Yep. I'm looking. I see a fair chunk of items not in cached memory but don't see who has it.
I see one of @djakubosky process (30291) with a large number of deleted /tmp files still held open. I've seen in some cases such files hold open over time some memory in a weird way. Is that process still active?
python 30291 30461 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30462 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30463 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30464 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30465 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30466 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30467 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30468 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
python 30291 30469 djakubosky 18u REG 253,0 4096 40370195 /tmp/ffiVV2JE0 (deleted)
I'll flush the cache and see whats left after it.
That process is still active, but he said it shouldnt have a ton of temp files in it; he only read in two files using jupyter with it.
We arent sure why. We can't kill it right now since it takes a few hours to get it up and running and its currently in use
Thats fine. They look small. When I see the 4th column there with large numbers I've seen memory get stuck in what is called a slab cache (another part of Linux memory cache performance items). Leave it going. I'm asking the kernel to drain its caches (takes awhile)
sounds good.
The system is already down to 12.3G so it looks like that fixed it
So there may be a form of slab or open file leak in that or another process. Lets see what happens. Leave this open. You should rarely have to flush the cache.
Whats the way for us to check this in the future before looping you in?
Hard to say at the moment as I don't know what was involved.
Sounds good. Will keep you posted.
Took a quick look before I call it a day today. I show some continued growth in the "used" category but I'll try some periodic lower level command loops to see if I can spot what is I suspect leaking in some way.
Noting has not re-occurred as far as I can tell. Monitoring a bit longer.
Leaving open one more week while I travel.
This appears to have not re-occured at least from Ganglia's point of view. Closing for now but I'll be reviewing a periodic memory flush.
Hey Paul (& Hiroko, though I know she's on vacation),
We have like ~110 GB of RAM that is unaccounted for but constantly used on flh2. David and I check all our notebooks and htop, and see like 15GB that should be used.
Any thoughts?
Thanks!