STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

Improve creation of rootfile_list #117

Closed ktfhale closed 10 years ago

ktfhale commented 10 years ago

Currently, rootfile_list will miss any filenames not exactly 18 charaters long, as I discovered while testing. See run_imaging_pipeline.py, line 124:

rootfile_list = [filename for filename
                 in rootfile_list
                 if len(filename.split('/')[-1]) == 18
                 and filename.split('/')[-1].split('_')[-1] == 'c0m.fits']

This was done to exclude all filenames with a _cr_ in them. We should maybe improve our method of excluding those files.

There's plenty of other hardcoded string slices that might not be happy with non-c0m.fits inputs, too. I'm not sure, I haven't checked too carefully yet.

acviana commented 10 years ago

I think this might work better right?

rootfile_list = [filename for filename
                 in rootfile_list
                 if '_cr_' not in filename]
ktfhale commented 10 years ago

I'm looking into replacing what we have with what Alex suggested above. Should work! But in the process I've discovered we have cr rejected images inside our wfpc2 data archive. Should they be there? There are 1,843 of them, all in these projects to do with jupiter:

['06141_jupiter', '05313_jupiter', '06145_jupiter', '11096_jupiter', '06743_jupiter', '10782_jupiter', '06029_jupiter', '11102_jupiter', '06009_jupiter', '07616_jupiter', '11310_jupiter', '05837_jupiter', '06315_jupiter', '07589_jupiter', '05828_jupiter', '07430_jupiter', '05217_jupiter', '06028_jupiter', '07308_jupiter', '06025_jupiter', '06452_jupiter', '08148_jupiter', '08871_jupiter', '05829_jupiter', '05167_jupiter', '06509_jupiter', '06662_jupiter', '08405_jupiter', '11498_jupiter']

EDIT: There are also_cr_c1m symlinked files, and png outputs in these directories. Alex said I should feel free to delete them. I should be able to delete every png/ directory and everything with a _cr_ in astro/mtpipeline/archive/wfpc2 and not do anything bad.

ktfhale commented 10 years ago

Also, a question. The original code also checks to make sure files end in c0m.fits. Is that behavior we want to keep? That is, do we want to make all of our ACS and WFC3 input files also end with c0m, or should the behavior be modified to also accept flt.fits?

ktfhale commented 10 years ago

I've deleted all _cr_ and png/ outputs from the WFPC2 archive. This is what I'm replacing the rootfile_list list comprehension with:

rootfile_list = [filename for filename
                 in filelist
                 if '_cr_' not in filename
                 and ('_flt.fits' in filename or '_c0m.fits' in filename)]
ktfhale commented 10 years ago

I've edited run_imaging_pipeline.py and run_cosmicx.py in the filename_handling branch. The pipeline should now recognize all _c0m.fits and _flt.fits files, and should still reject any files that end in those strings but contain _cr_.

I'm going to test this branch on the 90 image set I'm currently copying over to my laptop. If it seems to be working, I'll merge this branch with the other one I'm working on, add_metadata (which contains the code to grab cosmicx params out of the headers) and test the resulting branch on the same dataset.

acviana commented 10 years ago

Can we write a unit test for this behavior? It's pretty important to get it right.

ktfhale commented 10 years ago

The main change that's been made in the filename_handling branch is just to the list comprehension above, which is inside the run() function in run_imaging_pipeline.py. A unit test of that would basically be testing running the pipeline. I think it's a good idea, but, especially on Travis, it would need some work to make happen.

One possible unit test is one that tries to run the pipeline on six images, one from each instrument. We could have these images live somewhere in MTPipeline/tests. There could also be a test settings.yaml file, which the unit test would move into the correct spot (and back out) during the test. The pipeline outputs could also sent to the tests folder, and the test would clean them up when it finishes. It'd make the GitHub repo much larger, though. Maybe we could host the files elsewhere, and have the test download them?

In regards to a smaller unit test, I'm not sure how it would be structured. We definitely should add to test_imaging_pipeline, adding another entry to expected_output_list that corresponds to an _flt.fits input file.

EDIT: actually, Wally did that in Ticket #106!

ktfhale commented 10 years ago

The pipeline's humming along on the 90 test files I randomly selected, 15 from each detector. It appears to be working just fine. It's produced cr rejected outputs for all the _flt.fits files, and is successfully running Astrodrizzle and png production on the WFPC2 data. I think I'm going to merge filename_handling into add_metadata.

ktfhale commented 10 years ago

Per the merge in Ticket #126, this branch's changes are now in add_metadata, which I'll merge into the master in short order. This issue can be closed.