Node bundling script for pre-processing

villarrealas commented 6 years ago

This PR adds a script for pre-processing that runs over a list of instance catalog files and outputs a list in which each entry is a list with the following elements:

Node ID to send jobs to.
Sensors to run on that Node.
Instance Catalog associated with that node.

As a single instance catalog may have jobs sent to multiple nodes, this is why it is now a list inside of a list inside of a list. Suggestions for better data structure that seems less cyclical are appreciated.

Node bundling is currently done using a modified FFD (First Fit Decreasing) algorithm. The modifications account for the following limits:

You may not pack more than 63 threads to a node, so groups of 63 are pruned off first from each visit.
You may not exceed 192 GB of memory to a node, so a hard limit of 10 different visits being applied to a node is enforced (this assumes 10 GB of memory per instance catalog + 5 GB of memory reserved for arbitrary imsim ramp-up).

Again, a better bundling algorithm would improve things, though how inefficient this algorithm depends on the distribution of sensor visits pretty heavily. The current big drawback is that remaining groups of sub-63 threads are never broken into smaller groups for fitting.

I will merge this to the master branch once we are happy with it for initial use and have run some basic tests.

villarrealas commented 6 years ago

Note that current code includes an attempt at using Parsl to speed up the calculation for every sensor for a given visit (since the order of the chips does not matter).

...since I am unfamiliar with Parsl, this may break catastrophically. If so, it is fairly straightforward to remove the Parsl decorator and the .result() from the function call and have a code that is independent of Parsl.

villarrealas commented 6 years ago

Having a new issue w/ line 149 ( temp.append(visit_job_queue[idx].pop()) ) now returning "pop from empty list" for some cases. I suspect the issue is passing an empty list in (0 sensors), but I am confirming.

villarrealas commented 6 years ago

Definitely passing an empty list into the main fitting loop, but I am struggling to catch that to get rid of it.

adrianpope commented 6 years ago

Just wanted to add that I had been planning for 63 sensor visits per KNL node because 189/3 = 63, but if we're doing a more flexible mapping between focal plane visits and KNL nodes then we may want to default to 64 total multi-processes per KNL node. In my personal testing so far I haven't seen much evidence that we need to leave one core empty for system and load balancing reasons, at least for codes that aren't eg. tightly-coupled low-latency MPI codes. Eventually I think we may want to be able to specify that number at run time.

Also, I'm new to pull requests - should I wait until the empty list bug gets fixed before merging?

villarrealas commented 6 years ago

We should wait until the empty list bug gets fixed before merging - it stops us from running on the full instance catalog set.

Changing number of cores currently is just a hard-coded choice but the algorithm should allow for that to be changed simply. I think we can merge ahead of me adjusting for that option and making certain that nothing goes bonkers, though.

For replication purposes, I use the following processes to run this: Starting the Singularity Image in Shell Mode cd /projects/LSSTADSP_DESC/ALCF_1.2i/inputs singularity shell -B .:/mnt/cwd:rw,/home/antoniov/ALCF_1.2i/scripts:/mnt/scripts /projects/LSSTADSP_DESC/ALCF_1.2i/imsim.simg cd /mnt/cwd

I then start Python and run the following commands: import os; from glob import glob; instcat_list = [y for x in os.walk('./') for y in glob(os.path.join(x[0], 'phosim_cat*'))]

import sys; sys.path.append('/mnt/scripts'); import determine_job_bundles as djb; nodelist = djb.determine_bundling([instcat_list[:5]); print(nodelist)

You can run on single instance catalogs, but it does need to be passed in as a list explicitly due to how some of the code is written: aka [instcat_list[0]]. I'll probably check about fixing that in the future.

villarrealas commented 6 years ago

Okay. The loop error is fixed. What was happening was there wasn't a secondary check in place anymore to stop searching once you found a fit, leading to the code to then try to fit in an empty list (and thus failing).

That is now addressed and the code should operate as intended on an arbitrarily large instance catalog list. Currently this just returns the list, but I've considered using simplejson to save the resulting list for future use.

Additional code snippet for somebody wanting to try using the simplejson to store this file (which probably is a good idea for combining different Theta runs to break this into chunks AND have it available for confirming results!):

import simplejson; f = open('output.txt', w') 
simplejson.dump(bundle_list, f) 
f.close()

It can then be pretty straightforward to simplejson.load(f)!

jchiang87 commented 6 years ago

@villarrealas I'm trying out the code now, but I'm thinking that since it takes so long to generate the lists of sensors for each visit using the InstCatTrimmer code, that it would be good to save those sensor lists as an intermediate data product. Then if we need to adjust parameters like max_threads_node or max_fit, we can do so and get the reconfigured node bundles fairly quickly. I'm not sure what format this intermediate data product should take, but I'm thinking that the format should be something that can easily be combined if we want to break up the generation of those sensor lists into parallel jobs.

villarrealas commented 6 years ago

I think I would make it another PR, but I could certainly make an updated version for 2.0i with those features. I just would need to play around with the inputs. This would also make it better for computing new bundles for restarting from checkpoints.

benclifford commented 6 years ago

Are the scripts/outputs/instcat_list_subset*.json files output from this script? If so, they maybe shouldn't be checked into version control.

villarrealas commented 6 years ago

Those files are actually inputs (they were the output of another python script I wrote to just generate those for testing purposes). I'll be removing those from the version control.

That being said, the actual bundle lists are fairly small, so I should be able to dig them up and pass them to you. At the very least, we still have the bundle lists from the re-runs to pass you, which look identical.

LSSTDESC / DESC_DC2_imSim_Workflow

Node bundling script for pre-processing #1