Implements a close copy of the infer code to run on MDC data. Should run in more or less exactly the same way:
pinto -p mdc run -e ../.env deploy-mdc --typeo :deploy-mdc
MDC data saves all data from a particular dataset (background and foreground) into a single file where data is structured as <ifo>/<GPS start timestamp>. It also pre-injects all its foreground data up front, and saves it identically to the background data (these files are typically background.hdf and foreground.hdf).
As such, I've stripped out injection, and I don't attempt to do any kind of injection recovery (this is handled by the MDC). Both background and foreground are EventSets.
I've gotten rid of asynchronous loading for simplicity. The background and foreground data are both loaded at the same time (still in chunks, though this can obviously go to if we want)
Since there's no shifts to parallelize over, I've added num_clients as an argument to the deploy script. This will distribute the segments (datasets in the code) in round-robin fashion to a fixed number of clients in reverse order of length (i.e. starting with the longest). Each client will then do inference on multiple segments
I implemented things using the new inference setup in #468 , which also requires ML4GW/hermes#48. If this is a headache, it should be pretty straightforward to just break the x array up into its two batch elements and make 2 requests like normal.
I have a fork of the ML MDC repo that has a working conda environment you can use to run the data generation and evaluation. The execution is pretty simple, they're just python scripts and their --help strings are pretty self-explanatory. My script will generate files in the exact format you need to run the evaluate.py. The only bummer is data generation takes a long time.
Let me know if folks run into issues with any of this and I can dig through my old implementation (which you might find helpful as well).
Implements a close copy of the
infer
code to run on MDC data. Should run in more or less exactly the same way:<ifo>/<GPS start timestamp>
. It also pre-injects all its foreground data up front, and saves it identically to the background data (these files are typicallybackground.hdf
andforeground.hdf
).EventSet
s.num_clients
as an argument to thedeploy
script. This will distribute the segments (datasets
in the code) in round-robin fashion to a fixed number of clients in reverse order of length (i.e. starting with the longest). Each client will then do inference on multiple segmentsI implemented things using the new inference setup in #468 , which also requires ML4GW/hermes#48. If this is a headache, it should be pretty straightforward to just break the
x
array up into its two batch elements and make 2 requests like normal.I have a fork of the ML MDC repo that has a working conda environment you can use to run the data generation and evaluation. The execution is pretty simple, they're just python scripts and their
--help
strings are pretty self-explanatory. My script will generate files in the exact format you need to run theevaluate.py
. The only bummer is data generation takes a long time.Let me know if folks run into issues with any of this and I can dig through my old implementation (which you might find helpful as well).