HumanExposure / ChemicalExposure-SSC

2 stars 1 forks source link

Walmart data extraction - v3 #187

Open kdionisio opened 5 years ago

kdionisio commented 5 years ago

Extract data manually for Walmart/CPCPdb products. If you have questions when looking at a pdf, you can always just skip and move to the next item!

  1. Navigate to http://factotum.epa.gov/datagroup/81/. Note pages 1-12 of the data document list are mostly complete, so can start on pages 13+.
  2. Click on the 'title' (e.g. item_###) for an entry which has a green check for extracted text but does not yet have a product
  3. Open the pdf in a separate window by clicking the small 'pdf' icon in the upper right of the page
  4. Select the blue '+' button in the Extracted Text section of the page
  5. Enter/correct the product name, document date (same format as it appears on the document), and revision number (if available)
  6. 'Save' (note if you get an 'integrity error' at this step, skip and please send me the link to the data document page where you get the error!)
  7. 'In the 'Composition Detail' section, the fields are now editable. Enter the relevant chemical name, CAS, and composition information (if a point value, put into the 'central comp' field; if a range, put into the min and max comp fields). Note the unit type is a required field even if there is no composition information available. In this case select 'unknown' for unit type.
  8. Make sure to 'save composition edits'
  9. If you need to enter >1 new/additional chemical record, after entering each chemical record, hit 'Save composition edits' which will save your edits and bring up a new blank chemical form for the next chemical entry.
  10. Be sure to save edits after entering the last chemical.
  11. Select 'create new product' button and add all fields you have available.
    • Title (this should be product name)
    • Manufacturer (ok to leave blank if not available)
    • Brand (ok to leave blank if not available)
    • UPC (if present on pdf then enter, otherwise leave default 'stub'; note only enter ONE UPC here, if the pdf contains more than 1 UPC they will be entered as separate products)
    • Size (typically only present if the pdf includes a UPC)
    • Color (typically will not have this)
    • Data document type (e.g. MSDS, SDS, etc., usually stated at top of pdf)
  12. Select 'Save'
  13. Open link to product page in new window and assign product to a PUC
RKalsch commented 5 years ago

Integrity Error 1) http://factotum.epa.gov/extractedtext/edit/271962/ <-link to error http://factotum.epa.gov/datadocument/271962/ <-link to original document

2) http://factotum.epa.gov/extractedtext/edit/271940/ <-link to error http://factotum.epa.gov/datadocument/271940/ <-link to original document

3) http://factotum.epa.gov/extractedtext/edit/271923/ <-link to error http://factotum.epa.gov/datadocument/271923/ <-link to original document

4) http://factotum.epa.gov/extractedtext/edit/271947/ <-link to error http://factotum.epa.gov/datadocument/271947/ <-link to original document

5) http://factotum.epa.gov/extractedtext/edit/272000/ <-link to error http://factotum.epa.gov/datadocument/272000/ <-link to original document

rcboykin commented 5 years ago

Integrity Error

  1. http://factotum.epa.gov/extractedtext/edit/281275/ <-link to error http://factotum.epa.gov/datadocument/281275/ <-link to original document

  2. http://factotum.epa.gov/extractedtext/edit/281300/ <-link to error http://factotum.epa.gov/datadocument/281300/ <-link to original document

  3. http://factotum.epa.gov/extractedtext/edit/281228/ <-link to error http://factotum.epa.gov/datadocument/281228/ <-link to original document

l-koval commented 5 years ago

Integrity Error

  1. http://factotum.epa.gov/extractedtext/edit/281307/ <- link to error http://factotum.epa.gov/datadocument/281307/ <- link to original data document page

  2. http://factotum.epa.gov/extractedtext/edit/281297/ <- link to error http://factotum.epa.gov/datadocument/281297/ <- link to original data document page

  3. http://factotum.epa.gov/extractedtext/edit/281284/ <- link to error http://factotum.epa.gov/datadocument/281284/ <- link to original data document page

  4. http://factotum.epa.gov/extractedtext/edit/281311/ <- link to error http://factotum.epa.gov/datadocument/281311/ <- link to original data document page

  5. http://factotum.epa.gov/extractedtext/edit/281274/ <- link to error http://factotum.epa.gov/datadocument/281274/<- link to original data document page