getwilds / cancerprof

API Client for State Cancer Profiles
http://getwilds.org/cancerprof/
Other
2 stars 3 forks source link

Metadata Parsing proposal #100

Open realbp opened 3 months ago

realbp commented 3 months ago

the metadata for each of the 4 categories in state cancer profiles is different so I think having different ways to handle the parse of the metadata would be needed. Using a list of strings and having a key and value might be the most useful for users to have control of the metadata.

demo_metadata_list <- list(
  input = c(
    "Demographic Data Report for Washington by County",
    "Education: Less than 9th grade",
    "Population Ages 25+",
    "All Races (includes Hispanic), Both Sexes",
    "2017-2021 American Community Survey 5-Year Data"
  ),
  sortedby = "Value",
  createdby = "statecancerprofiles.cancer.gov on 01/10/2024 6:16 pm.",
  data_sources = "Demographic data provided by the Census Bureau (http://www.census.gov/) & the American Community Survey (http://www.census.gov/acs/www/).",
  data_dictionary = "For more information about Education: Less than 9th grade, see the dictionary at http://statecancerprofiles.cancer.gov/dictionary.php#education.",
  data_limitations = "Data for United States does not include Puerto Rico."
)

Incidence and Mortality metadata is much more complex.

incd_metadata_list <- list(
  input = c("Incidence Rate Report for Washington by County",
            "Bladder (All Stages^), 2016-2020",
            "All Races (includes Hispanic), Both Sexes, Ages <50")
  sortedby = "Rate",
  createdBy = "Created by statecancerprofiles.cancer.gov on 02/23/2024 5:25 pm."
  trend = c(
    "Rising when 95% confidence interval of average annual percent change is above 0.",
    "Stable when 95% confidence interval of average annual percent change includes 0.",
    "Falling when 95% confidence interval of average annual percent change is below 0."
  ),
  rate_note = c(
    "Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population.",
    "Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified."
  ),
  trend_note = c(
    "Incidence data come from different sources.",
    "AAPCs are calculated by the Joinpoint Regression Program."
  ),
  data_sources = c(
    "National Program of Cancer Registries [https://www.cdc.gov/cancer/npcr/index.htm]",
    "Surveillance, Epidemiology, and End Results [http://seer.cancer.gov]",
    "SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.",
    "SEER November 2022 submission."
  ),
  data_dictionary = c(
    "Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population.",
    "AAPCs are calculated by the Joinpoint Regression Program and are based on APCs."
  ),
  data_limitations = c(
    "Data has been suppressed to ensure confidentiality and stability of rate estimates.",
    "Data for the United States does not include data from Nevada.",
    "Data for the United States does not include Puerto Rico."
  )
)