dingo-gw / dingo

Dingo: Deep inference for gravitational-wave observations
MIT License
55 stars 19 forks source link

Domains #14

Closed stephengreen closed 2 years ago

stephengreen commented 2 years ago

Just a few suggestions for how we implement domains, the first one being the most significant:

In general the only domain we care about at this point is FrequencyDomain, and later we almost certainly also want NonuniformFrequencyDomain. It's okay to leave TimeDomain not fully implemented throughout for now.

max-dax commented 2 years ago

I Implemented most requirements for the UniformFrequencyDomain.

The other issues are mostly style related. I will have a look at them later, but for now I focus on what's required to get the prototype running.

max-dax commented 2 years ago

I also made a few changes such as moving build_domains to domains.py, and adding the domain_dict which allows for recovery of the domain via the build_domain function.

stephengreen commented 2 years ago

For the data truncation, the only use case I can think of is when we have to generate EOB waveforms starting from much lower frequency than is ultimately desired. Is there another use case you have in mind?

max-dax commented 2 years ago

We use it all the time. For IMRPhenom, we save the waveforms with frequencies in the range [0, 1024], but only use [20, 1024] for training. The truncation method takes care of that. In fact, we can even apply it to the compression Vh matrix, such that the decompressed output is already truncated. Generally we want to be able to change the frequency range in train_config.yaml without having to regenerate an expensive dataset.

stephengreen commented 2 years ago

Looking at the code, I believe that the domain truncation could be simplified if the frequencies between 0 and f_min are chopped off at the very end of the transform sequence, rather than the beginning. This is how it was done in the old code:

https://github.com/stephengreen/lfi-gw/blob/f4a8aceb80965eb2ad8bf59b1499b93a3c7b9194/lfigw/waveform_generator.py#L1598-L1603

The current approach requires significant duplication:

Moving truncation to the final preprocessing step will add about 2% to preprocessing costs (since we have to carry around frequencies below f_min), but it would mean the code is simpler and more maintainable (since we won't have to think about whether we are working with truncated or non-truncated data), and also it is consistent with standard assumptions in LIGO software (e.g., waveform generation routines).

The above discussion only applies to frequencies between 0 and f_min. We can still maintain the capability to reduce the frequency range from the dataset frequency range by specifying new f_min and/or f_max. This would be implemented via an argument to WaveformDataset.__init__ that modifies the dataset while loading it:

This way, though, we can always assume frequencies start at 0.

mpuerrer commented 2 years ago

This sounds like a good solution to me. It's important that the code is easy to understand and it seems that the current code is perhaps more complicated than it needs to be and I'd choose a 2% increase in computational cost for significantly easier code in a heartbeat. That's just my opinion and Max should comment as well.