CodeForPhilly / MATchMapper

7 stars 5 forks source link

Coordinating provider datasets from SAMHSA (x2+) to NPI/NPPES and PALS #19

Open jkdru opened 3 years ago

jkdru commented 3 years ago

Documenting Jan/Feb 2021 decisions and problems to solve for provider datasets.

For data recency & updates/archives: -- Include two DATE fields per table: date_firstfind and date_lastfind -- Our oldest date_firstfind is 2019-12-24, when @mariekers downloaded the 420 records initially provided by Code for Philly et al. in Feb/Mar 2020 data hackathon

For data privacy: -- DB will upload only rows where archival_only = FALSE

jkdru commented 3 years ago

Audit tables will collect provrecs_ (provider records) as is.

1) Download CSV from SAMHSA's Buprenorphine Practitioner Locator (BPLoc)

Note these changes relative to 2020Q1 - 2020Q3:

The dataset includes several new columns. In roughly descending order of helpfulness:

SAMSHA colnames | MATchMapper colnames

  • reachedPatientLimit | reached_patient_limit
  • certifiedFor100 | xwaiver_100
  • [x] Find out: Does TRUE mean "at least 100" (including 275) or "exactly 100" (excluding 275)?
  • latitude, longitude // Higher-res than same fields from SAMHSA's Find Treatment Locator (FTLoc) // To coordinate with lat, long for SITES (from FTLoc), round to nearest #.#### (e.g. 39.9482535 becomes 39.9483000; 39.9903107 becomes 39.9903000) -- see Issue #20 re: sites data
  • [ ] Check accuracy of geocodes for edge cases with CITY and/or ZIP errors (e.g. rows 527-529 "in Phila: ZIP error" cases)
  • addressId | bprec_id // Appears to function as PK for individual records rather than individual locations

We can no longer download just the filtered view. So for 2021 (reopening Issue #16) we need to:

2) SAMHSA: xwaivers

3) NPI

4) PALS ... (notes forthcoming)

jkdru commented 3 years ago

Sources for PROVIDER SPECIALTY filter:

jkdru commented 3 years ago

In SAMHSA providers col certifiedfor100, TRUE does indeed mean "at least 100" (including 275). So rename our xwaiver_100 to xwaiver_100or275. Useful for stakeholders doing providers outreach: all FALSE values = xwaiver_30

In SAMHSA providers col reachedPatientLimit, only FALSE is helpful (7 of >500 names, 3 updated in 2020q4 or 2021q1); TRUE is inconclusive without further context/research