IAU-ADES / ADES-Master

ADES implementation based on a master XML file
26 stars 7 forks source link

How to treat col 14 in mpc80coltoxml.py #54

Closed stevechesley closed 4 months ago

stevechesley commented 4 months ago

The issue is that obs80 column 14 is overloaded: It can be either a note or a program code, and so it is a bit sticky to decide how to map column 14 into ADES.

Right now [A..Za..z] are always treated as notes, but this is clearly wrong since the MPC reports many alphabetical program codes.

The prior approach was to treat col 14 as prog if stn was in the list of stations with program codes, otherwise put it in notes. But it seems likely that some stations with program codes are also submiotting astrometry with notes. @federicaspoto?

Options I see for figuring out whether col 14 is a note or program code:

federicaspoto commented 4 months ago

@stevechesley, a couple of comments:

I hope this helps! Federica

matthewjohnpayne commented 4 months ago

@federicaspoto , @stevechesley : sorry if I am misunderstanding, but is part of the problem that the conversion is context-dependent? I.e. approximately speaking ...

  1. If the observations have not yet been submitted-to/processed-by the MPC, then the data is a note
  2. If the observations are published, then the data is a program-code if prog-codes exist for that site, else is a note.

But we don't know who is using this routine, nor when they are using it, so it could be being used before/after submission, so we don't definitely know whether it's prog-code or note from col-14 alone.

Hence one could either (a) Add an "input-boolean" requiring that the user provide the context (and thus removing any ambiguity) or (b) Guess the context by looking for a publication-reference (in posns 72-77) and if that is blank, then assume it is a note (i.e. if not published, assume the code is being used pre-submission).

federicaspoto commented 4 months ago

@stevechesley, @matthewjohnpayne is absolutely right, I didn't think at all about the option of people using this code to convert unpublished observations.

I also spoke with Matt and it turned out that we cannot use option (b) because ITF observations might have program codes, but they don't have reference fields.

This leaves us with option one: adding an input-boolean type of options that will provide the context. In this way:

I believe that this should work.

stevechesley commented 4 months ago

So for MPC-published obs80 data, we just need to maintain a list of obscodes that have program codes, which is already implemented as programCodeSites in packUtil.py, but probably needs to be updated.

Yes, we could add an option --unpublished to mpc80coltoxml.py, which would force col 14 to be treated as a note. But only if it is in [A..Za..z]? Or just always put it in notes no matter what the (non-blank) value of col 14?

And what about the situation mentioned over in PR #53: If the status of an obscode flipped from not having program codes to having program codes then before the switch col 14 is to be treated as a note and after it is to be treated as a program code. So we need a date for each obscode to know what to do...

I'm starting to doubt whether it is worthwhile to try so hard for a perfect mapping of col 14. How about we do the simple thing of putting it in prog if stn is in programCodeSites and put it in notes otherwise?

federicaspoto commented 4 months ago

I'm starting to doubt whether it is worthwhile to try so hard for a perfect mapping of col 14. How about we do the simple thing of putting it in prog if stn is in programCodeSites and put it in notes otherwise?

It depends how people would use this code. If they use it for submitting XML data to us, that might be a problem. Because in that case they might want to submit things with the note, but the XML would only have the prog field.

stevechesley commented 4 months ago

This is (mostly) resolved with PR #53. The current approach for converting obs80 (non-blank) column 14 to XML is as follows:

This approach does have the limitations discussed above. It also means that programCodeSites and validNotes need to be occasionally synced to the MPC pages. An alternate approach could be to use the JSON form of this (new) table to better discern whether col14 is an allowed program code. But this would likely require much more frequent synchronization between ADES and the MPC.