HumanExposure / ChemicalExposure-SSC

2 stars 1 forks source link

Run Cleaning Script on Chemical Lists #991

Closed Sakshi-Handa closed 1 year ago

Sakshi-Handa commented 1 year ago

This ticket is to run the latest 'pre-curation' chemical cleaning script on Factotum records.

Run the chemical cleaning script on the following chemical lists: (Download from Factotum) Hewlett-Packard 1 Walmart- in CPDat RB SDS Tolerances and Exemptions: Food Pesticide Residue Tyco Fire Protection Products Big D Product Pages Athea Laboratories Ingredient Disclosure OEHHA Proposition 65 List (1/2023) Gelest Inc 2

Combine remaining chemical records into a 'Combined Datagroups' file. I was hoping to set the limit to <900 chemicals this time. I wanted to send the bigger datagroups individually, in case there are any curation issues. So that we can keep track of where problems are occurring. And if there is an issue with one DG, we will still get the other lists back.

*I intentionally skipped the 3M Spatial Concentration Data. This isn't something we are releasing with ChemExpo, so its not a priority for curation right now.

In the end, there should be 10 'cleaned' lists that we can send to ChemReg.

Sakshi-Handa commented 1 year ago

In a recent curation meeting, we decided to skip submitting the 'Walmart in CPDat' datagroup for chemical cleaning. We will wait until all of the documents have been manually re-extracted and QA is completed for the group.

Additionally, I requested to change the formatting of the csv file we send to ChemReg. Moving the datagroup id to the first column, and moving cleaned chem name and CAS ahead of the raw reported data columns. This will make it easier for the ChemReg curators to select the data columns to upload into ChemReg for auto-mapping. We are also requesting that they return these 3 'id' columns in the output file, so that we are able to track and locate any issues with the DTXSID mappings that are returned.

Order of Chemical Data Columns:

datagroup_id document_id raw_chem_id chemical_name casrn raw_cas raw_chem_name casrn_comment name_comment cas_in_name