Closed ericnost closed 4 years ago
Related to #48
Related, the code in block 10 itself could be optimized:
echo_row = pd.DataFrame(echo_data.loc[reg_id].copy()).T.reset_index() # Find and filter to the corresponding row in ECHO_EXPORTER
echo_row = echo_row[['FAC_NAME', 'FAC_LAT', 'FAC_LONG']] # Keep only the columns we need
program_row = pd.DataFrame([list(fac)[1:]], columns=program_data.columns.values) # Turn the program_data tuple into a DataFrame
full_row = pd.concat([program_row, echo_row], axis=1) # Join the EE row df and the program row df
frames = [my_prog_data, full_row]
my_prog_data = pd.concat( frames, ignore_index=False)
Though Steve said it was elegant :) it's actually creating a lot of unnecessary data frames and .concat
is slow.
I moved the Sunrise notebook's version of this to a utilities file. Could be worth implementing that here.
For instance, it took me 10 minutes to pull Water Quality Violation data for LA-2 using the existing code.
It took 1.5 minutes to do so using this:
my_prog_data=[]
...
e=echo_data.loc[echo_data.index==reg_id].copy()[['FAC_NAME', 'FAC_LAT', 'FAC_LONG', 'DFR_URL']].to_dict('index')
e = e[reg_id] # remove indexer
p = fac._asdict()
e.update(p)
my_prog_data.append(e)
...
pd.DataFrame(my_prog_data)
(Difference may be in part just the database's ability to respond at the moment)
Definitely move this, probably into make_data_sets.py.
The code in block 10, for getting program-specific data, could be moved to a utilities file.