USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

First contribution of Gerardo J. Ruiz-Mercado and Jose D. Hernandez-B… #23

Closed jodhernandezbe closed 4 years ago

jodhernandezbe commented 5 years ago

Hi Wes,

I add the needed folders of our first contribution.

Best wishes,

WesIngwersen commented 5 years ago

Jose's explanations with my comments: standardizedinventories-master folder The setup.py was modified to include BeautifulSoup, argparse, selenium, and regex in the installation requirements. WI: OK. A new file config.yaml was added. This file is useful to handle changes in a web site localization and/or the code source of a website is modified. In this moment only, there is information about RCRA, TRI, FRS, and SRS. WI: I agree such a config file could be useful A new file common.py is used to handle config.yaml. WI: This is OK, but somewhat redundant with what is now called global.py within the different folders. Perhaps your approach is more generalized and preferable, but please explain why.

stewi folder TRI.py was modified The following packages were included: argparse: to write the inputs directly from Windows CMD WI: OK, pending discussion about redesign beautifulsoup: to retrieve information directly from non-dynamic html WI: Nice regex: to handle regular expressions WI: We are already using the ‘re’ package for handling regular expressions in NEI.py. Can this be used instead io: to deal with various type of input/output WI: fine The files TRI_File_1_columns.txt and TRI_File_3a_columns.txt were added in stewi/data folder. These include all the names of the columns for TRI Data Plus File 1 and File 3a. Maybe, for data since 2018 a new column will be needed to add (recycling a new condition of use included since 2018). WI: OK we can add this at the time or create another file I used TRI Fila 3a because it has information about off-site transfer. WI: OK I could note that StEWI did not include information about the basis of estimate for off-site land treatment and other off-site land disposals. WI: Since these were not included before in stewi, basis of estimate was not needed for the reliability score calculation TRI_required_fields.txt was modified to add new information WI: OK TRI_keys.txt was added in stewi/data folder to include new TRI keys easier. WI: OK Files called TRI_chem_release_Year.csv were added to include new validation source from https://iaspub.epa.gov/triexplorer/tri_release.chemical To use TRI.py you can navigate to standardizedinventories-master (or stewi) in Windows CMD and write: python “stewi/”TRI.py Option Year -F File1 File2 … FileN Where Options are E, O, N: E is for extracting files from TRI Data Plus web site. N for organizing TRI National Totals files from TRI_chem_release_Year.csv (this is expected to be download before and to be organized as it is described in TRI.py). O for organizing TRI as required by StEWI E,g,. You want to use File 1 and File 3a of TRI Data Plus (as it is our case), retrieve information for TRI 2018. Therefore, you write in Windows CMD: python “stewi/”TRI.py E 2018 -F 1 3a After, if you want to create TRI_2017_NationalTotals.csv for validation: python “stewi/”TRI.py O 2018
The flag -F and the files are not needed, but you need to have TRI_chem_release_2018.csv in data folder. Finally, you want to organize this for StEWI: python “stewi/”TRI.py O 2018 -F 1 3a Note: As you know it is possible that TRI include new columns. Therefore, you only need to include them in TRI_File_1_columns.txt and TRI_File_3a_columns.txt (or other files you want). WI: This is a major redesign to run these steps from the command line. Another issue is that we need to distinguish these wastes that are not ‘releases’ with different ‘Compartment’ identifiers. For RCRAInfo, we use just a ‘waste’ compartment. We could use that as a main compartment, and add another ‘Subcompartment’ field that specifies more information about location, or other fields, to the ‘flowbyfacility’ output. We need to decide on this quickly as a group.

RCRAInfo.py was modified Selenium package was used to handle dynamic html due to requests and urllib3 do not handle this. WI: See comment below about chromedriver dependency

To use RCRAInfo.py you can navigate to standardizedinventories-master (or stewi) in Windows CMD and write: python “stewi/”RCRAInfo.py Option Year -T Table1 Table2 … TableN Where Options are E, O, C: E is for extracting files from RCRAInfo web site. O for organizing Biennial Report for each year due to the current flat file has information of all year totally mixed. C for creating the files StEWI needs.

E,g,. You want to retrieve table BR_REPORTING. Therefore, you write in Windows CMD:

python “stewi/” RCRAInfo.py E -T BR_REPORTING

After, if you organize for each year the table BR_REPORTING (This not take data for existing RCRAInfo report)

python “stewi/” RCRAInfo.py O -T BR_REPORTING

Finally, you want to organize Biennial Report 2017 for StEWI

python “stewi/” RCRAInfo.py C 2017

The flag -T and the files are not needed, but you need to have RCRAInfo_2017_NationalTotals.csv in data folder. WI: Like above, This is a major redesign to run these steps from the command line. The RCRAInfo National Totals were obtained from https://rcrapublic.epa.gov/rcrainfoweb/action/modules/br/trends/view RCRA_FlatFile_LineComponents_2019.csv was added due to changes in the specification of flat files https://rcrainfo.epa.gov/rcrainfo-help/application/publicHelp/index.htm ValidationSets_Sources.csv file was modified to include the source for TRI National Totals and RCRAInfo National Total. For TRI for year between 2001 and 2017 and RCRAInfo from 2001 to 2009 and 2017. WI: Great

chromedriver.exe was added as the driver for selenium WI: Are there any potential issues with having this ‘chromedriver.exe’ in the repository? That could seem to be a problem for a couple reasons, but licensing would be one.

chemicalmatcher folder Some modifications in globals.py, programsynonymlookupbyCAS.py, and writeStEWIchemicalmatchesbyinventory.py were made. WI: These look fine facilitymatcher folder Some modifications in globals.py, WriteFacilityMatchesforStEWI.py, and WriteFRSNAICSforStEWI.py were made. WI: These look fine

Added: flows_missing_SRS_ID.csv WI: Good to add it, but we should put this one in the chemicalmatcher/output folder. But the version I generated had 49 records and yours has more, 62.

Removed: all example files, license, readme, gitignore, etc. WI: This pull request request can’t be accepted or it would remove all those files.

WesIngwersen commented 4 years ago

@jodhernandezbe Update columns in TRI_File_3a_columns.txt, TRI_File_1_columns

WesIngwersen commented 4 years ago

Check that RCRAInfo commands work