STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

WFPC2 File Completeness Checking Script #80

Closed acviana closed 10 years ago

acviana commented 10 years ago

Create a script that checks the completeness of the WFPC2 data products.

Specs

I suggest you run git status between these steps to see what git is and isn't tracking. Be sure to git add all your changes.

Updating the file tree

# Database Settings
# db_connection: mysql+pymysql://root:user@localhost/mtpipeline
# db_echo: False

# Email Settings
# email_switch: True
# contact_email: user@institute.edu

# File System Settings
# acs_input_path: /absolute/path/to/files
# acs_output_path: /absolute/path/to/files
# wfc3_input_path: /absolute/path/to/files
# wfc3_output_path: /absolute/path/to/files
# wfpc2_input_path: /absolute/path/to/files
# wfpc2_output_path: /absolute/path/to/files
# logging_path: /absolute/path/to/files
acviana commented 10 years ago

From your email:

Do I need to implement the function that is going to create the c0m.fits files?

No, the files have already been created by imaging_pipeline.py, we're just checking to make sure that all the output files were actually creates and nothing was skipped due to a crash or something.

Are these files supposed to be/to be stored at /astro/3/mutchler/mt/drizzled/? If so, where is this directory located? Do I need to create it?

All the subdirectories of /astro/3/mutchler/mt/drizzled/ should contain the c0m.fits files you are looking for. They should all have the format <target_name>_<proposal_id>.

In the settings.yaml file, where should I put '/astro/3/mutchler/mt/drizzled/'?

wfpc2_output_path

To import this directory from the settings.yaml file I will need to import yaml so that I can get the directory, right?

Yes, this function already exists. Add this to your script: from mtpipeline.get_settings import SETTINGS. SETTINGS is capitalized by convention because it is a global variable.

Which files are going to be in this directory? Only the c0m.fits or all of them including the corrected files, png files and so on.

/astro/3/mutchler/mt/drizzled/*_*/ should contain all the FITS images. All the PNG images should be in an /png/ subdirectory of each folder. Looking at the code now this /png/ subdirectory might not be accurate encoded in make_output_file_dict, but they still end up in the correct directory. I'll have to take a look at why that is before we can change anything with that.

From the moment I know where the files are stored, I need to implement a function that is going to check if all the required files were created, is that correct?

Yes, but we should talk a little bit about how you plan to do that.

walyssonBarbosa commented 10 years ago

I tried to run python checks/check_filesystem_completeness.py just to see how it works and it says permission denied. Will I need access permission to this directory?

Traceback (most recent call last):
  File "checks/check_filesystem_completeness.py", line 73, in <module>
    setup_logging()
  File "checks/check_filesystem_completeness.py", line 20, in setup_logging
    level = logging.INFO)
  File "/Users/wbarbosa/Downloads/Ureka/python/lib/python2.7/logging/__init__.py", line 1529, in basicConfig
    hdlr = FileHandler(filename, mode)
  File "/Users/wbarbosa/Downloads/Ureka/python/lib/python2.7/logging/__init__.py", line 902, in __init__
    StreamHandler.__init__(self, self._open())
  File "/Users/wbarbosa/Downloads/Ureka/python/lib/python2.7/logging/__init__.py", line 925, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/astro/3/mutchler/mt/logs/check_file_completeness/check_file_completeness_2014-06-03-11-21.log'

About how I plan to do the script, here is what I am thinking (I don't know if it is clear):

acviana commented 10 years ago

So checks/check_filesystem_completeness.py is the script I was hoping to replace. I didn't like like it because it reimplements the file name generation logic from make_output_dict. I also thought it would be a good first project for you because it would expose you to the repository and the output file system.

The reason you are getting an error is because it is trying to write to the log files area, which I haven't given you permission to access yet. I just fixed that.

Can you get the following to run?

> mkdir ~/notebooks
> cd ~/notebooks
> ipython notebook

An interactive IPython notebook window should appear.

walyssonBarbosa commented 10 years ago

Yes, I could see that it was creating the dict all over again.

Done!

walyssonBarbosa commented 10 years ago

As we are going to replace checks/check_filesystem_completeness.py and still we will need to log and check the missing files, can I use the setup_logging() and check_file() functions?

acviana commented 10 years ago

Yes, we are going to replace checks/check_filesystem_completeness.py. We're going to move setup_logging to it's own module in mtpipeline so other scripts can use it. check_file is one option, I'm looking at others as well.

Ok, I'm assuming that "done" means you got the IPython notebook to work. I like to use the notebook as an environment to do some rapid prototyping. You can do everything you can do in python in the notebook including importing modules. Try some different ways of checking to see if the files are present.

walyssonBarbosa commented 10 years ago

I moved setup_logging to a module named setup_logging.py in mtpipeline.

The files you want me to check are the ones located in '/astro/3/mutchler/mt/drizzled/`? And the code for the checking should be written in the IPython notebook, right?

acviana commented 10 years ago

Yes, that's correct.

walyssonBarbosa commented 10 years ago

Before I added the png files the total of files was 313257, now it's 541687.

I changed this line:

c0m_file_list = [filename for filename in all_fits_file_list if filename.split('/')[-1].split('_')[-1] == 'c0m.fits']

I removed len and added .split('_')[-1] because there was a file that ended with drz.fits. So I wanted to make sure the files ended with c0m.fits.

walyssonBarbosa commented 10 years ago

How should the png section in make_output_file_dict be changed?