ML4GW / aframe

Detecting binary black hole mergers in LIGO with neural networks
MIT License
18 stars 17 forks source link

Comparison point for our offline throughput #442

Closed alecgunny closed 4 months ago

alecgunny commented 10 months ago

Eliu's paper from 2017 reports that they "process the entire month of August 2017 with our deep learning ensemble in just 7 min" using 64 V100s. We should figure out exactly how much active data this consists of and use this to estimate the throughput per GPU (in units of seconds of data per second), then compare this to our own throughput.

We can start by analyzing the server stats from our runs on 1-year and 2-month datasets (which have different client:GPU ratios) and see which one produces better throughput.

wbenoit26 commented 10 months ago

I get a throughput of about 56 s/s via the following:

from mldatafind.segments import query_segments
segments = query_segments(["H1_DATA", "L1_DATA"], 1185580818, 1187733618)
throughput = sum([j - i  for (i, j) in segments]) / (7 * 60) / 64 

Note that 1187733618 corresponds to 10 PM on August 25th, because I think that's when O2 stopped. At least, it's when GWOSC stops having open data

EthanMarx commented 10 months ago

Wow, orders of magnitude faster...

Also going to note here that it might be useful to report a total compute budget (and maybe conversion to cost in $) for operating online, taking into account amount of background we wan't to analyze on the fly etc.

High level overview: