AllenInstitute / datacube

Other
0 stars 1 forks source link

split up correlation filters into `filters` and `mask` #77

Open chrisbarber opened 6 years ago

chrisbarber commented 6 years ago

The fix implemented in #75 is unnatural and also incomplete in that there is still a possibility that the dim dimension is being filtered by a mask whose dims aren't (dim,)-- this is because masks are implicitly reduced via any on dimensions not shared by the field being masked. If those extra dims were over the correlation data field itself, and all evaluate to False, the user ought to expect an all-NaN result (which gets filtered to an empty dataset). On the other hand, if those extra dims are not on the correlation data field, the intention is probably that they shouldn't be used to mask the seed and are instead part of the filtering that is intended to narrow down the results along dim.

I think this dual-intention is best resolved by splitting up into two separate filter arguments. The filters argument will refer to the filter expression that gets applied only to the dim dimension, while the mask argument will refer to a filter expression used to mask the data operated on by the correlation calculation. The seed extraction and correlation computation steps can then cleanly apply their own logic to each set of masks.

Another benefit to this, which should also be implemented on this issue (or a followup issue should be opened), is that caching of the correlation computation can be keyed solely on the mask argument, and not the entire set of filters. This will mean more responsiveness in repeated queries that only modify the filters, which is a particularly easy pattern of requests to generate from the connectivity web app.

Note that this also drives changes in the connectivity web app and legacy_conn_routes. It may be possible to maintain backwards compatibility for the web app for a period of time, but probably not without keeping the fix implemented in #75 around for a while.