Closed simonw closed 3 years ago
Need a DB query that shows all concordances from all source locations where the matched location is missing some of those concordances.
with from_source_locations as (
select
concordanceidentifier_id,
source_location.matched_location_id as location_id
from
concordance_source_location
join source_location on source_location.id = concordance_source_location.sourcelocation_id
where
matched_location_id is not null
),
on_locations as (
select
concordanceidentifier_id,
location_id
from
concordance_location
),
missing as (
select
*
from
from_source_locations
except
select
*
from
on_locations
)
select
count(*)
from
missing
Returns 148,002
But we should exclude some of the providers - in particular ct_gov: covidvaccinefinder_gov: ct_covidvaccinefinder_gov
.
We have 144 authorities at the moment: https://vial.calltheshots.us/dashboard/?sql=select%20%22authority%22%2C%20count%28%2A%29%20as%20n%20from%20%28select%20id%2C%20authority%2C%20identifier%2C%20created_at%20from%20concordance_identifier%29%20as%20results%20group%20by%20%22authority%22%20order%20by%20n%20desc%3AydXh_yEhn8L7UhP0pZuK5zTvFVCUmDm6ilDzvbORH7I
I'm using this query to figure out what to backfill:
with on_source_locations as (
select
authority,
identifier,
concordanceidentifier_id,
location.public_id as location_public_id
from
concordance_source_location
join source_location on source_location.id = concordance_source_location.sourcelocation_id
join concordance_identifier on concordance_identifier.id = concordanceidentifier_id
join location on source_location.matched_location_id = location.id
where
matched_location_id is not null
),
on_locations as (
select
authority,
identifier,
concordanceidentifier_id,
location.public_id as location_public_id
from
concordance_location
join concordance_identifier on concordance_identifier.id = concordanceidentifier_id
join location on concordance_location.location_id = location.id
),
missing as (
select
*
from
on_source_locations
except
select
*
from
on_locations
)
select * from missing
It takes 10s to run so I ran it against production PostgreSQL directly.
It produced 148,029 rows - here: missing-location-concordances.csv
Here are the number of concordances I will be adding by authority:
authority | n |
---|---|
placekey | 62015 |
us_carbon_health | 13510 |
us_giscorps_vaccine_providers | 10194 |
_tag_provider | 9455 |
vaccinespotter_org | 7865 |
cvs | 7014 |
il_sfsites | 6981 |
al_arcgis | 6791 |
vaccinefinder_org | 4875 |
tx_arcgis | 2235 |
getmyvax_org | 1701 |
kroger | 1327 |
md_arcgis | 1185 |
rite_aid | 1094 |
wa_state | 1022 |
in_arcgis | 756 |
fl_state | 673 |
vaxfinder_gov | 658 |
nc_myspot_gov | 607 |
pa_arcgis | 581 |
mo_arcgis | 566 |
nyc_arcgis | 503 |
ca_los_angeles_gov | 436 |
vaccinate_nj | 420 |
wi_arcgis_map | 409 |
az_arcgis | 409 |
la_tableau | 408 |
sc_arcgis | 382 |
heb | 269 |
mn_gov | 258 |
health_mart | 223 |
ky_govstatus | 217 |
ok_vaccinate_gov | 183 |
vtrcks | 172 |
ia_state | 155 |
ct_gov | 153 |
ct_covidvaccinefinder_gov | 153 |
walmart | 146 |
wa_prepmod | 137 |
il_juvare | 115 |
prepmod | 97 |
tn_vaccinate_gov | 92 |
costco | 92 |
walgreens | 87 |
smiths | 81 |
little_clinic | 78 |
fred_meyer | 74 |
kaiser_permanente | 71 |
albertson | 66 |
ak_arcgis | 64 |
ri_arcgis | 63 |
ga_dph | 63 |
ak_clinic_list | 60 |
safeway | 57 |
wyo_appt_portal | 49 |
shop_rite | 47 |
ma_immunizations | 44 |
sav_on | 40 |
publix | 38 |
co_colorado_gov | 35 |
king_soopers | 33 |
genoa_healthcare | 29 |
immunizenevada_org | 28 |
tom_thumb | 22 |
randalls | 22 |
al_jefferson | 22 |
az_pinal_ph_vaccinelocations_gov | 20 |
pick_n_save | 19 |
ca_metrolink | 19 |
ct_state | 18 |
vons | 14 |
qfc | 14 |
ny_northwell_health | 14 |
sf_gov | 13 |
jewel_osco | 13 |
weis | 12 |
thrifty | 12 |
kaiser_health_plan | 10 |
ilvaccine_org | 10 |
harris_teeter | 10 |
sams | 9 |
raleys | 9 |
pavilions | 8 |
price_chopper | 6 |
hannaford | 6 |
dc_district | 6 |
acme | 6 |
united_supermarket | 5 |
hyvee | 5 |
giant_food | 5 |
giant_eagle | 5 |
duane_reade | 5 |
winn_dixie | 4 |
market_street | 4 |
us_physicians_immediate | 3 |
thrifty_white | 3 |
shaws | 3 |
hart | 3 |
giant | 3 |
frys_food_and_drug | 3 |
albertsons_market | 3 |
pharmaca | 2 |
dillons | 2 |
az_ph_pinal_clinics_gov | 2 |
amigos | 2 |
wegmans | 1 |
tx_memorialhermann | 1 |
star_market | 1 |
ralphs | 1 |
pay_less | 1 |
market_32 | 1 |
marianos | 1 |
kta_super_stores | 1 |
hartig | 1 |
haggen | 1 |
fresco_y_mas | 1 |
carrs | 1 |
Here's a tool for spot-checking that the IDs I am going to add look good: https://concordance-identifier-to-add.vercel.app/concordance-identifier-to-add?sql=select%0D%0A++authority%2C%0D%0A++identifier%2C%0D%0A++%27https%3A%2F%2Fvial.calltheshots.us%2Flocation%2F%27+%7C%7C+location_public_id+as+url%0D%0Afrom%0D%0A++concordances%0D%0Aorder+by%0D%0A++location_public_id
Created a datasette-placekey
plugin to make it easier to visualize these: https://concordance-identifier-to-add.vercel.app/concordance-identifier-to-add?sql=select%0D%0A++authority%2C%0D%0A++identifier%2C%0D%0A++%27https%3A%2F%2Fvial.calltheshots.us%2Flocation%2F%27+||+location_public_id+as+url%2C%0D%0A++placekey_to_geo_latitude(identifier)+as+latitude%2C%0D%0A++placekey_to_geo_longitude(identifier)+as+longitude%0D%0Afrom%0D%0A++concordances%0D%0Awhere+authority+%3D+%27placekey%27%0D%0Aorder+by%0D%0A++location_public_id%0D%0Alimit+100
They look good to me. I'm going to import all 148,029 using a Python script. If this turns out to be a mistake we can revert it - we have the above record of what we added, plus we have the API logs.
Here's the script I'm using:
import httpx
api_key = "25:..."
def fetch_all():
next_url = "https://concordance-identifier-to-add.vercel.app/concordance-identifier-to-add/concordances.json?_size=max"
while next_url:
data = httpx.get(next_url).json()
yield from (dict(zip(data["columns"], row)) for row in data["rows"])
next_url = data.get("next_url")
def batch(items, n):
for i in range(0, len(items), n):
yield items[i:i + n]
rows = list(fetch_all())
batches = list(batch(rows, 200))
def update_batch(batch):
return httpx.post("https://vial.calltheshots.us/api/updateLocationConcordances", json={
"update": {
r["location_public_id"]: {
"add": [
r["authority"] + ":" + r["identifier"]
]
}
for r in batch
}
}, headers={"Authorization": "Bearer {}".format(api_key)}, timeout=20)
from tqdm.notebook import tqdm
for b in tqdm(batches[1:]):
update_batch(b)
Running now. Should take about 40 minutes.
It's now 36% done and a count at https://vial.calltheshots.us/dashboard/?sql=select+count(*)+from+concordance_location%3AfcdtXiCoqfYQ3DSXzbOg2PsnRwZBkK81GGSH12taqac shows 341,685 records in the concordance_location
table.
https://vial.calltheshots.us/dashboard/?sql=select+%2A+from+api_log+where+api_key_id+%3D+25%3ABTpfamvr05SAenrmJjoab-XhrO0tNmCXssBmPiE6qqg shows the API calls made by the dedicated API key I set up for this job.
Backfill finished: 434,651 in concordance_location
.
This query shows concordance identifiers attached to locations (not source locations) counted by authority: https://vial.calltheshots.us/dashboard/?sql=select%20%22authority%22%2C%20count%28%2A%29%20as%20n%20from%20%28select%20concordance_identifier.%2A%20from%20concordance_identifier%20join%20concordance_location%20on%20concordance_location.concordanceidentifier_id%20%3D%20concordance_identifier.id%29%20as%20results%20group%20by%20%22authority%22%20order%20by%20n%20desc%3ArQvLweAl_9hf5Jok0xi1LtCktPq9uGrZX2Wh_qzitkE
authority n
placekey 88848
vaccinefinder_org 55574
us_carbon_health 46043
_tag_provider 42468
vaccinespotter_org 38908
getmyvax_org 27731
us_giscorps_vaccine_providers 22349
cvs 17332
walgreens 9061
google_places 8990
il_sfsites 7844
al_arcgis 7121
walmart 4774
health_mart 4285
tx_arcgis 4277
vtrcks 3685
kroger 3245
rite_aid 3132
wa_state 2488
fl_state 2313
sc_arcgis 2226
publix 2183
nc_myspot_gov 2025
md_arcgis 1711
safeway 1678
mo_arcgis 1656
wi_arcgis_map 1209
az_arcgis 1161
arcgis 1135
in_arcgis 1110
vaccinate_nj 1085
pa_arcgis 926
mn_gov 729
vaxfinder_gov 707
nyc_arcgis 702
la_tableau 690
ca_los_angeles_gov 626
costco 603
sams 582
ok_vaccinate_gov 545
ky_govstatus 541
ia_state 493
hyvee 474
ct_gov 423
ct_covidvaccinefinder_gov 409
tn_vaccinate_gov 386
winn_dixie 377
little_clinic 367
sav_on 333
heb 316
osco 288
vaccinefinder 264
meijer 256
stop_and_shop 253
harris_teeter 229
vons 226
prepmod 220
giant_eagle 213
il_juvare 186
weis 185
smiths 182
ri_arcgis 163
immunizenevada_org 162
hannaford 160
giant_food 154
ga_dph 153
shop_rite 150
king_soopers 147
fred_meyer 146
ak_clinic_list 144
hart 137
ct 129
albertson 126
wa_prepmod 123
frys_food_and_drug 123
tom_thumb 118
giant 118
sf_gov 114
kaiser_permanente 114
food_city 108
ma_immunizations 97
ak_arcgis 91
thrifty 89
ingles 85
price_chopper 84
wegmans 84
co_colorado_gov 81
cub_pharmacy 80
ralphs 72
acme 68
ct_state 62
covidvaccinefinder_gov 60
pick_n_save 59
brookshire 58
dillons 54
genoa_healthcare 53
marianos 41
raleys 41
united_supermarket 36
market_street 35
qfc 34
al_jefferson 33
randalls 32
food_lion 32
pavilions 31
ny_northwell_health 30
duane_reade 26
haggen 26
harps 24
ilvaccine_org 24
az_pinal_ph_vaccinelocations_gov 23
city_market 22
us_physicians_immediate 22
homeland 21
pharmaca 21
hartig 21
ca_metrolink 21
kaiser_health_plan 18
jewel_osco 18
harmons 18
medicap 18
thrifty_white 17
harveys 14
metro_market 10
dc_district 9
albertsons_market 9
bakers 9
wyo_appt_portal 8
carrs 8
pay_less 7
big_y 6
az_ph_pinal_clinics_gov 6
gerbes 5
shaws 5
tx_memorialhermann 4
fresco_y_mas 4
star_market 4
market_32 3
jayc 3
amigos 3
drugco 2
kta_super_stores 1
Following https://github.com/CAVaccineInventory/vial/issues/559#issuecomment-858190408_
/api/updateSoureLocationMatch
didn't used to copy across concordances from the source location to the location. It does now, as-of #559 - but we need to backfill all of the missing ones.