STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

TARGNAME parsing and database updating #156

Open ktfhale opened 9 years ago

ktfhale commented 9 years ago

One of the central difficulties facing the pipeline is the identification of what the targets are in each image. We need to use information from the header keyword TARGNAME to find out what object is in the image, after which we use JPL's HORIZONS ephemeris to figure out if anything else is in the image.

But parsing the TARGNAME keyword is difficult. TARGNAME could be a HORIZONS-recognized object. But it could contain a HORIZONS object only as a substring. Or it could contain multiple HORIZONS objects. Or it could contain abbreviations of multiple HORIZONS targets.

In the update-master-finders branch, parsing the TARGNAME keyword is done in the get_header_info() function of build_master_finders_table.py. It solves the issues in the previous paragraph by hardcoding a dictionary which specifies known tricky TARGNAME values, and maps them to HORIZONS objects. For instance, jupitertest2 maps to jupiter, sat to saturn, gany to ganymede, gan to ganymede, etc.

This dictionary only covers WFPC2. If we want to keep this method, we'll need to expand this dictionary to cover all the problematic TARGNAME values in the WFC3 and ACS inputs.

But when we adopted the MAST naming conventions, we decided to have the target names in the filename as well. I added a function to imaging_pipeline.py, get_mtarg, that solves the problem in a more general way.

In my function, if a JPL HORIZONS object is a substring in TARGNAME, than that object is added to the target list. So you don't need to manually specify cases like jupitertest2. To deal with cases where TARGNAME contains an abbreviation of a HORIZONS object, like jup, I specify an abbreviations dictionary. To deal with cases where one HORIZONS object may be a substring of another HORIZONS object (pan, anthe, io, and titan are all substrings of other objects) I hardcode a dictionary of these problems, and pipeline ignores the shorter string if the longer string is present.

I think this functionality should be used when building the ephemeris database as well. We will only need to expand the dictionary of abbreviations in the WFC3 and ACS inputs, whereas with the older method we would need to expand the dictionary for every special case.

Regardless of which method we choose, one thing that definitely needs to be fixed in get_header_info() is how we assert that TARGNAME does not match a HORIZONS target. At the moment, if 'asteroids' is in the filepath, then we don't bother checking. That won't do for the new inputs, a lot of whose targets aren't HORIZONS objects, but whose filepaths don't contain 'asteroids' .

There are other problems in get_header_info(), like relying on the presence /acs/ or /wfc3/ in the filepath, that needs to be fixed. In fact, get_header_info() probably should get a complete refactoring, as I suspect it makes a lot of assumptions about the filenames that are probably no longer valid under the MAST conventions.

ktfhale commented 9 years ago

Here's a todo list of the changes we expect are necessary to get the database working for the new inputs:

I've got the alteration of get_mtarg(), its usage in database_interface.py, and the creation of planets_and_moons.yaml.

ktfhale commented 9 years ago

One challenge we have is that, in working with update-master-finders-table, we want features that are recent additions to master. I tried to rebase update-master-finders-table on top of the current version of master, and it wasn't pretty.

But merging update-master-finders-table into master can be done automatically, so I made a new branch by branching off of master, and another new branch by branching off of build-master-finders-table, and then I merged those into the branch update-database, which thus the most recent features of both master and update-master-finders-table.

I think this is an okay, if somewhat weird, workflow. At the moment, update-master-finders-table left being kind of useless, having been effectively superseded by update-database. But we definitely don''t want to just leave update-master-finders-table dangling. In theory, I could have created only a copy of master, and merged that with update-master-finders-table, but in practice I wasn't sure what I was doing would work and I didn't want to jeopardize update-master-finders-table (even though it's not like all of this isn't, theoretically, completely reversible, as history is still perfectly intact)

Eventually, I think we'll just merge update-database and update-master-finders-table together, prior to merging them into master.

ktfhale commented 9 years ago

I've added a planet_and_moons.yaml file. The top-level keys are the names of the planets, and entry for those keys are dictionaries. The keys of those sub-dictionaries are the moons of that planet, along with the key planet itself.

For example, for Neptune:

neptune:
    neptune:    899
    triton :    801
    nereid :    802
    naiad :     803
    thalassa :  804
    despina :   805
    galatea :   806
    larissa :   807
    proteus :   808
    halimede :  809
    psamathe :  810
    sao :       811
    laomedeia : 812
    neso :      813
ktfhale commented 9 years ago

This last one was a big commit. I've moved a lot of functions into a new module, mtpipeline/tools/, as those functions were being called by multiple parts of the pipeline, and were creating looping import problems.

I've also almost finished refactoring database_interface.py, although I think we're using the wrong column names.

In the master images table, we want to store the name of every target, even if its not a planet or moon. This means using the targname header keyword, if necessary. But in the master finders table, we want rows only for planets and moons from planets_and_moons.yaml. So I'm going to the switch the object_name and planet_or_moon column names.

ktfhale commented 9 years ago

I've made two big changes in this commit:

I've removed get_header_info() entirely. Previously, it had the job of parsing targname to determine whether it represented objects we wanted ephemeris overlays from. But that task has been moved to make_all_moons_dict, and so the only thing left for get_header_info to do was to get three header keywords. That now happens in ephem_main().

I've refactored make_all_moons_dict(). The purpose of this function has always been to determine, given information from the targname, all the bodies we want ephemeris information for in this image. Simply stated, if the targname mentions only a single moon, we want ephemeris information for that moon's primary, as well as all of it's fellow moons.

make_all_moons_dict() now uses get_mtargs() to parse the targname. It looks through planets_and_moons.yaml instead of planets_and_moons.txt. to find out what objects we want ephemeris information for.

Eventually, we should transition to using only planets_and_moons.yaml. At the moment, planets_and_moons.txt is still used at least by get_planets_and_moons_list(), which is used at least by get_mtargs(). This should go in a separate ticket.