Avoid receiver-specific behavior

tchamberlin commented 10 months ago

Spinning this into a new issue

The if else clause in dysh.fits.GBTFITSLoad.subbeamnod is there because the Ka receiver (Rcvr26_40) has had its feeds mislabelled for some time now. So, if you want to calibrate the data properly for that particular receiver you have to swap the polarization and feed values. For the rest of the receivers the method should use the same polarization for the selected feeds. I hope that clears a bit the confusion.

Apologies for not explaining that earlier.

PS. There is an open item to fix the mislabelling of the Ka receiver feeds. That is also why there is a commented part of the else clause referring to the date of the observations. For observations taken after the Ka beam mislabelling issue is fixed the polarizations should not be swapped.

Originally posted by @astrofle in https://github.com/GreenBankObservatory/dysh/issues/153#issuecomment-1843835026

If I'm understanding correctly, Ka data prior to the mislabeling, and any taken after it is fixed, will be handled incorrectly in Dysh? Unless there is a way to detect the mislabeling based on FITS data, I don't think there's any way to handle this in code. We could for example provide a way of mapping labels to beams via a config file, leaving it up to the user to determine when the mappings should be inverted

In general we should avoid receiver-specific behavior (i.e. conditionals based on receiver name). There will perhaps be cases where it is unavoidable, but in general we can base our conditionals on attributes of a receiver other than its name

astrofle commented 10 months ago

As it is now, yes, the data would be handled incorrectly after the beam mislabelling is fixed. Once the mislabelling is fixed, my plan was to set a date constraint; e.g., if the receiver is Ka and after the fix, then proceed as usual, if Ka and before the fix, then invert the labels here. There's no way of telling that the beam labels are flipped from the SDFITS metadata.

I do not like leaving it up to the user to correct this kind of problem. It has not worked well in the past.

I've seen in other observatories that fixes to data issues are implemented as part of the data reduction package (e.g., DPPP for LOFAR will fix bad metadata if you are working with observations taken during specific time ranges).

For receiver specific issues (like the Ka beam mislabelling), What other attributes, instead of receiver name, could we use to address this kind of problem?

How much effort would be required to retroactively update the engineering fits files to label the beams correctly?

tchamberlin commented 10 months ago

I'm not saying that Dysh shouldn't handle this. This should be transparent to the user, but also easy for us to maintain as new quirks like this inevitably pop up. I'm also not saying this is a high priority at this stage of development -- this is a very minor thing. But one of the big issues that we have with things like configtool, SDFITS filler, the original pre-2018 version of OOF/poof is that there are lots of receiver-specific things being done all over the place, and trying to modify behavior becomes difficult. So, I'd like to make sure that we address this early

Specifically what I'm worried about here is two-fold:

Using a string literal to identify a receiver is problematic. It makes refactoring difficult, leaves us open to typos causing silent failures, prevents IDEs/autocompletion tools from helping us during development, etc.
Having the code here doesn't scale well

For example, if we continue adding cases like this, it can get out of hand fairly quickly

if rx == "Rcvr26_40" and (
    (datetime(2016, 1, 1) < date_obs < datetime(2016, 6, 1))
    or (datetime(2019, 1, 1) < date_obs < datetime(2022, 1, 1))
):
    # Switch the polarizations to match the beams.
    if fdnum == 0:
        plnum = 1
    elif fdnum == 1:
        plnum = 0
elif rx == "RcvrArray75_115" and (datetime(2023, 1, 1) < date_obs < datetime(2023, 6, 1)):
    ...
...

Then imagine we have more receivers here, and that this sort of branching was distributed across various methods/files. Every time we add a new receiver or modify an existing one, lots of places need to change -- but which places? How do we find them?

So I'd propose introducing some sort of receiver data structure, and either housing this logic there or in some sort of get_plnum() method that considers the receiver, dateobs, and any other relevant criteria, then spits back the correct plnum.

If we were just trying to solve the string literal issue, something like an enum is typically used for this. e.g.

from enum import Enum
class Receiver(Enum):
    RcvrPF_1: "RcvrPF_1"
    ...
    Rcvr40_52: "Rcvr40_52"
    Rcvr68_92: "Rcvr68_92"
    RcvrArray75_115: "RcvrArray75_115"

if rx == "Rcvr_26_40":  # silent failure due to typo 
    ...

if rx == Receiver.Rcvr_26_40:  # typo causes error
    ...

But realistically we'll probably end up with is something resembling a receiver database, because you at the very least will need to map between the M&C name and the common name. Rather than having a bunch of dictionaries, it's easier to have a tabular structure, either in JSON/CSV/etc or SQLite

For the OOF rewrite we made a CSV file that looked like this (I've snipped off a few columns):

Modern Python has a dataclass now, so regardless of how we represented this in the DB, we'd end up with a proper Receiver class that we could use for such comparisons:

from dataclasses import dataclass

@dataclass
class Receiver:
    mc_name: str
    abbrev: str
    num_beams: int
    num_pols: int

    def __str__(self):
        return f"{self.abbrev} ({self.mc_name})"

Rcvr26_40 = Receiver(mc_name="Rcvr26_40", abbrev="Ka", num_beams=2, num_pols=2)
# ^ These would be generated for each receiver from CSV/JSON/SQL

print(Rcvr26_40)  # Prints 'Ka (Rcvr26_40)'

Last thing I'll say is that doing this sort of swap silently is problematic. For example, imagine a case where a user has manually fixed their data files, because they are using them in some other pipeline. Dysh is unaware of the swap, so it swaps them back, and doesn't inform the user. How would they go about debugging this? How would they know there was anything to debug? We at the very least need to be logging that the swap has happened (I see that there's no logging in Dysh at all right now; this is something we can address separately)

And a larger question: is there any way to automatically detect swapped beams? i.e. can we check for errors here, or check when beams need to be swapped based on some heuristic rather than a date range/receiver name?

teuben commented 1 month ago

Reminder for later: There was something similar with ARGUS where two beams were mislabeled before a certain date

GreenBankObservatory / dysh

Avoid receiver-specific behavior #160