frazer-lab / cluster

Repo for cluster issues.
1 stars 0 forks source link

Luster inode full #259

Closed hirokomatsui closed 6 years ago

hirokomatsui commented 6 years ago

df -i command shows /frazser01 is 100% used. I'm checking the way to clear. Can you check it as well?

hirokomatsui commented 6 years ago

Seems like we just have to remove files.

tatarsky commented 6 years ago

I believe so. Thats a ton of files fairly recently I feel from last time I looked at inode count. Did something add millions of small files?

hirokomatsui commented 6 years ago

I think so. We're trying to collect them.

billgreenwald commented 6 years ago

Yea that would be me. I needed them all for about a month or so while i made single summary files of them. Currently tarring and zipping

tatarsky commented 6 years ago

You read my mind ;)

Yes, thats a good way to preserve the data and cuts the inode count.

I will however add an alert for inode filling. I didn't have one because frankly I didn't think we'd hit it ;)

Thanks!

tatarsky commented 6 years ago

Another area I often ask folks to check is their SGE output/stderr files. Those can really add up. And sometimes a good tar of them or delete if not needed....

tatarsky commented 6 years ago

I am cleaning up some of my trees as well.

tatarsky commented 6 years ago

One item we may want to address as a separate git is what do we do with users that have left? I believe we have a few such users in /home.

billgreenwald commented 6 years ago

One problem is when people leave, they dont necessarily move all needed files out of their homedir

tatarsky commented 6 years ago

Oh I fully understand that reality.

tatarsky commented 6 years ago

But I believe these folks are gone if you want some space/inodes back as they don't have passwd entries anymore.

patrick 16G ~114K files nnariai 62G ~275K files

I could always just tar the dirs up and slide them over to the NAS box in case something was found needed.

s041629 commented 6 years ago

I remove a couple of hundred thousand files. Next week (after the grant is submitted), I can remove more

billgreenwald commented 6 years ago

So i also actually recently created ~ 178 million other smaller files that I need, but I will figure out a way to condense them and then rewrite existing scripts to use these condensed formats.

Sorry about that

billgreenwald commented 6 years ago

So I wrote a script that should convert each set of 222k files to a single file. I can go in and delete the sets throughout the day as they get converted to single summary files. The python time estimate was 25 minutes per set, so pypy should get a modest speed up. However, its mainly limited by disk i/o, rather than compute speed, so the speedup won't be huge.

tatarsky commented 6 years ago

Sounds reasonable to me! My cleanings and others have ~778K inodes free at the moment.

billgreenwald commented 6 years ago

Good news: we are at 5% free inodes now (~21M) Bad news: at current run time, itll take ~1.5 weeks for me to finish removing the files I made. Good news: when done, we should be at like 55% free inodes (~220M)

I think itll be ok for that amount of time though.

tatarsky commented 6 years ago

I like the presentation of the situation ;) Always end with good news on a Friday!

I've been watching and agree with your statements.

One thing that might help though is are you using just one system to do the deletes? You could pop a qlogin or something to some nodes to get some parallel deletes. Might help if a few went at once if thats possible. The filesystem is not loaded.

But I fine with that time frame and I added Nagios alerts for this concept. Thanks for the update!

tatarsky commented 6 years ago

I show the inode count within the level I set the alert (greater than 15% free). We'll want to discuss however in the planned Lustre expansion if we feel this is enough inodes or if we wish to make a new filesystem when the time comes. Closing for now.

billgreenwald commented 6 years ago

Just finished with my files; we are at 56% free inodes now. Sorry for all the trouble, but we should be fine for a while