Closed TomGlanzman closed 8 years ago
This problem has been addressed by making more use of local /scratch space available on all batch nodes. The amount of space needed is ~300 MB per phoSim instance and ~16,000 inodes. SLAC batch farm currently has six generations of batch machines and all of them should be able to handle the space/inode demands, even if all available cores are dedicated to this workflow task. One possible complication is that local /scratch space is voluntarily managed by the user who places data into that space; if that user fails to clean up after a job terminates, then the scratch space slowly fills up -- much like a memory leak in C++.
The implemented solution involves the following new steps in the workflow:
1) sprinkled SED files are generated in local /scratch, then a .tar.gz of the spectra_files directory is copied to Lustre to be used later. (Instance catalog continues to be written directly to Lustre)
2) In preparation to running phoSim, the production SEDs are sym-linked to a location in local /scratch, wherein the Lustre copy of spectra_files.tar.gz (sprinkled SEDs) are unpacked.
3) The phoSim /work directory is placed in local /scratch
4) upon completion, the local /scratch is cleaned up (removed), leaving reference copies of the instanceCatalog, sprinkled SEDs, and production SEDs in Lustre. The Lustre space can be cleaned up (manually) once the Twinkles group is satisfied they are no longer needed. Again, my plan is to retain these data for the first NNNNN visits of Run3, where NNNNN is yet to be determined, but at least 1000.
Tests of this refactoring of the workflow appear successful.
The Lustre file system as implemented at SLAC has a 100 GB metadata store which is used to store, among other things, the full file path of each and every file/directory/symbolic link. Running phoSim for Twinkles Run 3 typically requires ~12,000 production SEDs + another ~4,000 sprinkled SED files. The file paths are necessarily long (but descriptive) which compounds the problem. During the initial running of this workflow, all instanceCatalogs and SED files are preserved for downstream validation/debugging. Finally, the goal of running 2500 concurrent instances of phoSim means many millions of link/files. What was observed is that once this workflow began, the metadata usage within Lustre rose from ~38 GB to >100 GB at which point, no new files could be created.