edgi-govdata-archiving / ECHO-COVID19

Jupyter Notebook for tracking the effects of EPA's non-enforcement policy
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-COVID19/blob/main/ECHO-COVID19.ipynb
GNU General Public License v3.0
1 stars 1 forks source link

Consolidate code further #26

Closed ericnost closed 4 years ago

ericnost commented 4 years ago

There is a lot of redundant code related to getting data from the ECHO_EXPORTER table. This could possibly be consolidated in a global function or class.

shansen5 commented 4 years ago

@ericnost Would you like me to work on this? I've been figuring this out for Cross Program and could apply some of that here.

ericnost commented 4 years ago

@shansen5 - Sure! That would be really helpful. As you will see, there are a lot of blocks of code that look something like this:

sql = "select * from `ICIS-AIR_STACK_TESTS`"
url='http://apps.tlt.stonybrook.edu/echoepa/?query='
# Get the URL for your search result
data_location=url+urllib.parse.quote(sql)

# Download the data from that URL
stack_data = pd.read_csv(data_location,encoding='iso-8859-1',header = 0)
stack_data.set_index("pgm_sys_id", inplace=True)

So, perhaps the solution is to use the ECHO_modules data classes, though we only need to substantiate them for a select number of tables.

There are also a lot of calls to ECHO_EXPORTER that could probably also be consolidated, e.g.:

# Pull out Ids to match ECHO_EXPORTER
ids = latest.index.unique()

if (len(latest.index)>0):
    # Get facility information from ECHO
    sql = "select FAC_NAME, AIR_IDS, FAC_LAT, FAC_LONG, FAC_PERCENT_MINORITY, CAA_QTRS_WITH_NC, DFR_URL" + \
        " from ECHO_EXPORTER where AIR_FLAG = 'Y' "
    url='http://apps.tlt.stonybrook.edu/echoepa/?query='
    data_location=url+urllib.parse.quote(sql)
    air_echo_data = pd.read_csv(data_location,encoding='iso-8859-1',header = 0)
    # Filter ECHO EXPORTER data to rows containing pgm_sys_ids from latest  
    idxs=list()
    for index,value in air_echo_data["AIR_IDS"].items():
        for i in value.split():
            if i in ids:
                idxs.append(index)
    air_echo_data = air_echo_data.iloc[idxs,:]
    air_echo_data.set_index( 'AIR_IDS', inplace=True ) 

    # Merge ECHO and CAA data
    latest = latest.join(air_echo_data)

else:
    print("Actually, there were no permit exceedences for %s" %(mnth_name))