HumanExposure / ChemicalExposure-SSC

2 stars 1 forks source link

FracFocus Database #992

Closed Sakshi-Handa closed 1 year ago

Sakshi-Handa commented 1 year ago

Investigate the potential of extracting the FracFocus database of chemical compositions of fracking wells.

We currently have a datasource for FracFocus : https://ccte-factotum.epa.gov/datasource/85/ with a data document of general chemicals used in fracking, with CAS and functions. https://ccte-factotum.epa.gov/datadocument/1512354/ Looks like this page corresponds to this updated page on their site https://fracfocus.org/explore/chemical-names-and-cas-registry-numbers

There is an option to download a sql or csv file of their database here https://fracfocus.org/data-download

There are also webpages/PDFs of disclosures for each well, which provide chemical names, CAS and composition information. Ex. https://fracfocus.org/wells/16127033000000

To Discuss: What is the best way to approach extracting this data? by scraping the site for each well, or through the csv files of the database? In Factotum, we want to maintain the webpage/PDF records for data provenance. There is more information in the csv (geographic), but maybe not suited for Factotum.

Would this data be best represented as a Composition or Chemical Presence group? We can extract each well as a composition document, so we can get the percentages of each chemical. We could create a PUC for 'fracking fluids'. But I am wondering if we would want to create 'products' for each document, since they aren't actually products with reported composition data. I'm not sure if this data is reported, or measured for each well? We might also have to make a new 'document type' for this type of chemical disclosure.

Sakshi-Handa commented 1 year ago

Decided to scrape the web pages for each well, which have the composition data in a standard format. The data can be uploaded as a composition group. We can collect info on location in the 'description' field of the datadocument.

Sakshi-Handa commented 1 year ago

Added new PUC for 'hydraulic fracturing fluids', where these products will be assigned. the well number can be added in the product field for 'product_id'. and extract associated metadata for location, etc.