STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

Create database_reset script #121

Closed acviana closed 10 years ago

acviana commented 10 years ago

I have a plan for how we are going to rebuild the ephemerides database. The first thing I am going to need is a script to reset the database. This script should drop all the tables and then issue all the CREATE TABLE statements needed to rebuild it.

Obviously this is both a useful script for testing in a development environment as well as a very dangerous table in production. For this reason I would like the script to prompt the user to confirm the operation by typing in the database name from the connection string (not the whole connection string).

This script should be created in the tools/dev/ folder (create it). Use the SQLAlchemy objects mtpipeline/database/database_interface.py for this. Specifically the Base class will allow you to issue CREATE and DROP statements while the engine object will allow you to access the database url.

walyssonBarbosa commented 10 years ago

Here's the script I created:

#! /usr/bin/env python

from mtpipeline.get_settings import SETTINGS
from mtpipeline.database.database_interface import loadConnection
from mtpipeline.database.database_interface import MasterImages
from mtpipeline.database.database_interface import MasterFinders
from mtpipeline.database.database_interface import Finders
from mtpipeline.database.database_interface import SubImages

db_name = SETTINGS['db_connection'].split('/')[-1]

def reset_db(conn_string):
    if conn_string == db_name:
        session, Base, engine = loadConnection(SETTINGS['db_connection'], SETTINGS['db_echo'])
        meta = Base.metadata
        meta.reflect()
        meta.drop_all()
        meta.create_all()
    else:
        print "Database not found"

if __name__ == '__main__':
    reset_db(raw_input("Name of the database to be reset: "))

I tested it and it seems to work.

acviana commented 10 years ago

So instead of:

db_name = SETTINGS['db_connection'].split('/')[-1]

we could do instead do:

engine.url.split('/')[-1]

Which do you think is preferable and why?

A few other points:

acviana commented 10 years ago

Also, please add docstrings using the following conventions: https://github.com/STScI-Citizen-Science/MTPipeline/wiki/Docstring-Conventions

walyssonBarbosa commented 10 years ago

I used the reflect() without knowing what it does. But it seems that it takes the full set of data in the tables. We can reset the database without using this tool.

This is the script after the changes:

#! /usr/bin/env python

from mtpipeline.get_settings import SETTINGS
from mtpipeline.database.database_interface import Base, engine
from mtpipeline.database.database_interface import MasterImages
from mtpipeline.database.database_interface import MasterFinders
from mtpipeline.database.database_interface import Finders
from mtpipeline.database.database_interface import SubImages

def reset_db(conn_string):
    """
        Resets database.

        Parameters:
        conn_string: string
            Database name given by user when the script starts to run.

        Returns:
        nothing

        Output:
        nothing
    """
    if conn_string == str(engine.url).split('/')[-1]:
        meta = Base.metadata
        meta.drop_all()
        meta.create_all()
    else:
        print "Incorrect database name, not performing reset."

if __name__ == '__main__':
    reset_db(raw_input("You are about to reset a dababase. All existing data will be lost. Please enter the database name to confirm this action: "))
acviana commented 10 years ago

So you chose to use engine.url instead of SETTINGS['db_connection'] this time, why?

walyssonBarbosa commented 10 years ago

Because we don't need to create another variable to store this information and because database_interface.py has already called SETTINGS['db_connection'].

acviana commented 10 years ago

I would agree with that. I also worry that there could be some crazy way where the SETTINGS['db_connection'] value might not be the value of the engine in the instance you're looking at. This would be hard to do, maybe if you had passed the engine around to an IPython, reset the yaml file, and then forgotten to reload the engine.

It's a corner case (if it even exists) but I feel more comfortable pointing directly to the object linked to the database to identify the URL then the configuration file used to build that object.

Please add this script to a branch and submit a PR and cite the ticket in the commit message.