Online deployment updates

ML4GW / aframev2

Detecting binary black hole mergers in LIGO with neural networks

MIT License

6 stars 14 forks source link

Online deployment updates #151

Closed EthanMarx closed 3 weeks ago

EthanMarx commented 3 months ago

Small refactor on top of #149, mainly building out cli with jsonargparse.

Needs testing, which requires training amplfi / aframe models with the new infrastructure

EthanMarx commented 3 months ago

@wbenoit26 Take a look at this and let me know what you think.

There's probably a few features that are still missing from your version that can be added back in (e.g. writing out data for debugging)

Would be easiest to test this out with aframe / amplfi models trained with their new infra. Going to get that started.

wbenoit26 commented 3 months ago

Erik brought this up at the Aframe meeting yesterday: we should consider writing out all of our output (possibly downsampled), just so we have something to refer back to in the case of non-detections. Storage-wise, that shouldn't be too burdensome. I think it makes sense to add here whenever you add the event file writing.

wbenoit26 commented 2 months ago

@EthanMarx Just looking over stuff now - we should make it a priority to get this updated online deployment code going at the workshop.

EthanMarx commented 2 months ago

Yeah definitely, although I think I can do a lot of the testing on my own. Let me try this out today and we can iron out any details at the workshop

katyagovorkova commented 2 months ago

@EthanMarx @wbenoit26 I am currently using this PR for gwak to submit to Playground. I'll comment on the code where I get errors, let me know if I shouldn't do it.

EthanMarx commented 2 months ago

@katyagovorkova This PR is a refactor that needs some testing. I would start with #149 first!

katyagovorkova commented 2 months ago

@katyagovorkova This PR is a refactor that needs some testing. I would start with #149 first!

got it, will switch to the other one, thanks!

EthanMarx commented 1 month ago

@wbenoit26 Got this "running" with dummy models on the aframe account on CIT. I think next steps before merging are:

Train new O4 model in new infra over all of O4a
Train amplfi model
Test this deployment and compare directly with existing version

I also added a online subcommand to aframe-init to initialize a run directory