daniellecrobinson / Data-Rescue-PDX

Volunteer guide, and other materials for DATA RESCUE PDX
30 stars 6 forks source link

CDC1623F-7C1A-438B-80F3-F147659BB31C #60

Open anntenna opened 7 years ago

anntenna commented 7 years ago
{
    "title": "Aggregated Computational Toxicology Online Resource",
    "notes": "This resource is a link to the ACToR (Aggregated Computational Toxicology Online Resource) website by EPA, where data is aggregated from thousands of public sources on over 500,000 chemicals.The data is available for download via the Download link: https://actor.epa.gov/actor/download.xhtml. Details on the licensing information is available at https://edg.epa.gov/EPA_Data_License.html",
    "license_id": "public domain",
    "landingPage": "https://actor.epa.gov/actor/home.xhtml",
    "id": "CDC1623F-7C1A-438B-80F3-F147659BB31C",
    "isPartOf": "Environmental Protection Agency",
    "tags": "EPA",
    "organization": {
        "description": "EPA’s Aggregated Computational Toxicology Online Resource (ACToR) aggregates data from thousands of public sources on over 500,000 chemicals. It is searchable by chemical name and other identifiers. ACToR is also the data and web applications warehouse for EPA’s computational toxicology information which includes high-throughput screening, chemical exposure, sustainable chemistry (chemical structures and physicochemical properties) and virtual tissues data.",
        "title": "Environmental Protection Agency",
        "name": "EPA",
        "is_organization": true,
        "image_url": "",
        "type": "organization",
        "id": "epa-gov"
    }
}
jimtyhurst commented 7 years ago

I followed the link https://actor.epa.gov/actor/download.xhtml which is a page that has a link to the actual data discussed in the landing page:

https://actor.epa.gov/actor/archive/v8/actor_2015q3.sql.gz

So I would add a resources tag to the end of the JSON:

"resources": [{"url": "https://actor.epa.gov/actor/archive/v8/actor_2015q3.sql.gz"}]

There is no need to do any scraping for the data, because there is just that one data file.