Closed simonw closed 3 years ago
There are useful comments on the old issue: https://github.com/CAVaccineInventory/django.vaccinate/issues/9#issuecomment-785544632
Note to self: revisit this notebook http://localhost:8888/notebooks/Extract%20objects%20from%20a%20column.ipynb
After extensive conversation with Nicholas Schiefer I have a much better idea of how to approach this now.
Notes from talking through the model:
Report source: Caller app or Data corrections. We have this implicitly right now - the idea is to capture the possible ways in which data arrives in this table. Standard one is from help.vaccinateca -
There are sites that have no phone number! LA County operate super-sites (Disney Land, Dodger Stadium) - there’s no point in calling Disney Land to ask about vaccines. You interface with that system exclusively through their website. So the ground truth genuinely is the website for those cases.
So a Data Correction is when someone manually files a report by looking at their website
It’s almost important to figure out from the historical data which of these it is. Probably impossible prior to the move to the help.vaccinate app - it added a Reported By column which lets you know if it’s an embedded Airtable app - but for those old records you can’t tell.
The help. transition means that every report is written by the same Airtable user - the one used by the Netlify functions. And there are NEW columns that were added at that point to record the Auth0 user.
Not all the things in the Airtable Reports table are actually caller reports. For historical reasons this wasn’t just the table that logs reports - it was also what the API published. So there are reports in there (which are identifiable) that are from ETL jobs.
Because there's so much weird historical context embedded in the Airtable records (and the chances of me interpreting them correctly the first time is slim) I'm going to take a new approach: a JSON column which I will populate with the entire record from Airtable.
This will ensure we keep all historic data and will let us run smarter backfills as our understanding grows.
Once we switch away from AIrtable the airtable_json
column will stop being populated, which I think is fine.
My first opportunity to use the new JSONField
from Django 3.1! https://docs.djangoproject.com/en/3.1/ref/models/fields/#jsonfield
I really like having the original/ Airtable JSON available to view in the admin - e.g. on https://vaccinateca-preview.herokuapp.com/admin/core/callreport/21192/change/
Also neat: I added the ability to filter for just the records where airtable_json
is not empty, which makes it easy to see just the imported (or not-imported) data: https://vaccinateca-preview.herokuapp.com/admin/core/callreport/?airtable_json__isempty=0
Well this is absurdly useful:
SELECT jsonb_object_keys(airtable_json) AS key, count(*) FROM call_report GROUP BY key;
Run that against PostgreSQL and I got this:
JSON key | Times used |
---|---|
Affiliation (from Location) | 21589 |
airtable_createdTime | 21589 |
airtable_id | 21589 |
Appointments by phone? | 466 |
Appointment scheduling instructions | 2526 |
auth0_reporter_id | 1583 |
auth0_reporter_name | 1583 |
auth0_reporter_roles | 1583 |
Availability | 21589 |
County (from Location) | 21589 |
Do not call until | 3116 |
external_reports_base_external_report_id | 3139 |
Hour | 21589 |
ID | 21589 |
Internal Notes | 18258 |
is_latest_report_for_location | 21589 |
is_pending_review | 18 |
Location | 21589 |
location_id | 21589 |
location_latest_eva_report_time | 13443 |
location_latest_report_id | 21589 |
location_latest_report_time | 21354 |
Location Type (from Location) | 21588 |
Name (from Location) | 21589 |
Notes | 3359 |
Number of Reports (from Location) | 21589 |
parent_eva_report | 958 |
parent_external_report | 2410 |
Phone | 82 |
Reported by | 21589 |
report_id | 21589 |
Report Type | 21589 |
soft-dropped-column: Vaccines available? | 15948 |
time | 21589 |
tmp_eva_flips | 21589 |
Vaccine demand | 733 |
Vaccine demand notes | 4 |
Next I'll fix the reported_by
stuff. Here's a recent record:
"Reported by": {
"id": "usr3nFXwxJnpVjv4i",
"name": "Help.vaccinateCA RW Role account for API token",
"email": "jesse+role-rw-help-vaccinateca@vaccinateca.com"
},
...
"Internal Notes": "\n",
"Do not call until": "2021-02-26T00:54:24.995Z",
"auth0_reporter_id": "auth0|602ac647cea25d006a7cea17",
"auth0_reporter_name": "Dave Kasten",
If those auth0_reporter_id
fields are available it means the report was made by a user of the new help.vaccinate
app, and those should take precence and the Reported by
should be ignored (it's just the API token account).
If auth0_reporter_id
is missing then the reporter must have been using the Airtable app and their Airtable credentials should be recorded.
I think the trickiest task left is to assign new "appointment tags" to the Django records: https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/import_utils.py#L103-L104
A reminder: the appointment tags we have look like this:
Those booleans are for the has_details
field on the model.
They make more sense if you consider the calling app UI:
We're trying to model those selections using a combination of these five appointment_tags and the appointment_details
field on the model - which is typed into by the caller using the interface shown above:
The thing I haven't quite yet figured out is how to use the Airtable data to decide which tag was used.
@nschiefer @obra see above comment - I think you two may be best qualified to answer that?
I'm running the latest version of the reports import script now (after running the locations script first to reduce the chance of seeing location IDs in reports that haven't yet been created).
re https://github.com/CAVaccineInventory/django.vaccinate/issues/15#issuecomment-786454438
Yes, what is going on now is horrible and I'd love to make it better.
That's exactly what I needed to see!
const apptMethod = document.querySelector("[name=appointmentMethod]:checked")?.value;
switch (apptMethod) {
case "phone":
currentReport["Appointments by phone?"] = true;
currentReport["Appointment scheduling instructions"] = document.querySelector("#appointmentPhone")?.value;
break;
case "county":
currentReport["Appointment scheduling instructions"] = "Uses county scheduling system";
break;
case "myturn":
currentReport["Appointment scheduling instructions"] = "https://myturn.ca.gov/";
break;
case "web":
currentReport["Appointment scheduling instructions"] = document.querySelector("#appointmentWebsite")?.value;
break;
case "other":
currentReport["Appointment scheduling instructions"] = document.querySelector(
"#appointmentOtherInstructions"
)?.value;
break;
default:
break;
}
Well this is absurdly useful:
SELECT jsonb_object_keys(airtable_json) AS key, count(*) FROM call_report GROUP BY key;
Run that against PostgreSQL and I got this:
JSON key Times used Affiliation (from Location) 21589 airtable_createdTime 21589 airtable_id 21589 Appointments by phone? 466 Appointment scheduling instructions 2526 auth0_reporter_id 1583 auth0_reporter_name 1583 auth0_reporter_roles 1583 Availability 21589 County (from Location) 21589 Do not call until 3116 external_reports_base_external_report_id 3139 Hour 21589 ID 21589 Internal Notes 18258 is_latest_report_for_location 21589 is_pending_review 18 Location 21589 location_id 21589 location_latest_eva_report_time 13443 location_latest_report_id 21589 location_latest_report_time 21354 Location Type (from Location) 21588 Name (from Location) 21589 Notes 3359 Number of Reports (from Location) 21589 parent_eva_report 958 parent_external_report 2410 Phone 82 Reported by 21589 report_id 21589 Report Type 21589 soft-dropped-column: Vaccines available? 15948 time 21589 tmp_eva_flips 21589 Vaccine demand 733 Vaccine demand notes 4
The plan is to drop the columns that are the result of a join, right?
Actually no, I plan to leave the full Airtable JSON record in there forever more - stashed safely in the airtable_json
column. No features will be allowed to be built against it, it will exist purely so we can occasionally marvel at it - and more importantly, so if in six months we say "Oh wow, we totally screwed up importing that data" we can run a SQL query to fix our records by re-interpreting the vestigial JSON in that column.
Oh I see what you mean - you're talking about things like "Location Type (from Location)" which are derived columns even at the point they showed up in the Airtable export.
I'd rather leave those in, even though they're absolutely useless information. It's easier for me to think "the JSON in the airtable_json
column is the EXACT data we got out of their API" than worry about writing code that trims that down.
On the most recent run of import locations I got these skipped records:
Running vaccinate/manage.py import_airtable_locations --github-token=xxx on ⬢ vaccinateca-preview... up, run.6593 (Hobby)
Skipping rec0xZ5EaKnnynfDa [name=CVS Pharmacy® & Drug Store at 25272 Marguerite Pkwy, Mission Viejo, CA 92692], reason=No latitude
Skipping rec4KFYlaa485WMIX [name=Vo Medical Center - Calexico], reason=No latitude
Skipping rec7nHXCuSYRR61V0 [name=None], reason=No name
Skipping rec8Xk6kn4SvAKeEm [name=None], reason=No name
Skipping recCYZZRJCRlXykun [name=None], reason=No name
Skipping recDzR99nhYhVbvLA [name=Dr. Vincent V. Soun MD MPH MAJOR SARIN & TAO FAMILY MEDICAL CLINIC, INC.], reason=No latitude
Skipping recJ4iabS45l0rWB8 [name=Vo Medical Center - El Centro], reason=No latitude
Skipping recJt0iQbqmglF0XL [name=Dignity Health (Woodland)], reason=No county
Skipping recM2vjVCasanHStO [name=None], reason=No name
Skipping recMFogi5BI5krqKK [name=Walgreens Pharmacy #4559], reason=No county
Skipping recMSTzVRFGEuxS6M [name=CVS Pharmacy® & Drug Store at 638 Camino De Los Mares, San Clemente, CA 92673], reason=No latitude
Skipping recR9Wm4t02bIgKLR [name=Calexico Health Center], reason=No latitude
Skipping recTzWO20IAjnrcdS [name=PAMF Mountain View Center (Palo Alto Medical Foundation - Sutter)], reason=No latitude
Skipping recWqQc3bG7PiKQJi [name=Sharp Coronado Hospital], reason=No latitude
Skipping reccaeiuCMw6a7Dvg [name=None], reason=No name
Skipping recchGT7ticPxfr42 [name=None], reason=No name
Skipping recfGfKaDQtqAiSIx [name=None], reason=No name
Skipping recgudgmJck0ucv3X [name=All-inclusive], reason=No county
Skipping recicvTFyjV52Ulc6 [name=Calexico Wellness Center], reason=No latitude
Skipping reck8nN8LE7qUu3WY [name=None], reason=No name
Skipping rectdqnbtnzr54use [name=Vo Medical Center - Brawley], reason=No latitude
Skipping recwHJmOChMc6TeA8 [name=Osage inglewood], reason=No county
Skipping recy40U4HXu6hSmQ3 [name=South Medical], reason=No county
Skipping reczTSHfKgH9MyS4m [name=5148 Market St], reason=No county
I fixed quite a few of the missing latitude ones just now via Airtable.
Oh I see what you mean - you're talking about things like "Location Type (from Location)" which are derived columns even at the point they showed up in the Airtable export.
I'd rather leave those in, even though they're absolutely useless information. It's easier for me to think "the JSON in the
airtable_json
column is the EXACT data we got out of their API" than worry about writing code that trims that down.
I don't mean in the json, but in the "regular" table
On the most recent run of import locations I got these skipped records:
Running vaccinate/manage.py import_airtable_locations --github-token=xxx on ⬢ vaccinateca-preview... up, run.6593 (Hobby) Skipping rec0xZ5EaKnnynfDa [name=CVS Pharmacy® & Drug Store at 25272 Marguerite Pkwy, Mission Viejo, CA 92692], reason=No latitude Skipping rec4KFYlaa485WMIX [name=Vo Medical Center - Calexico], reason=No latitude Skipping rec7nHXCuSYRR61V0 [name=None], reason=No name Skipping rec8Xk6kn4SvAKeEm [name=None], reason=No name Skipping recCYZZRJCRlXykun [name=None], reason=No name Skipping recDzR99nhYhVbvLA [name=Dr. Vincent V. Soun MD MPH MAJOR SARIN & TAO FAMILY MEDICAL CLINIC, INC.], reason=No latitude Skipping recJ4iabS45l0rWB8 [name=Vo Medical Center - El Centro], reason=No latitude Skipping recJt0iQbqmglF0XL [name=Dignity Health (Woodland)], reason=No county Skipping recM2vjVCasanHStO [name=None], reason=No name Skipping recMFogi5BI5krqKK [name=Walgreens Pharmacy #4559], reason=No county Skipping recMSTzVRFGEuxS6M [name=CVS Pharmacy® & Drug Store at 638 Camino De Los Mares, San Clemente, CA 92673], reason=No latitude Skipping recR9Wm4t02bIgKLR [name=Calexico Health Center], reason=No latitude Skipping recTzWO20IAjnrcdS [name=PAMF Mountain View Center (Palo Alto Medical Foundation - Sutter)], reason=No latitude Skipping recWqQc3bG7PiKQJi [name=Sharp Coronado Hospital], reason=No latitude Skipping reccaeiuCMw6a7Dvg [name=None], reason=No name Skipping recchGT7ticPxfr42 [name=None], reason=No name Skipping recfGfKaDQtqAiSIx [name=None], reason=No name Skipping recgudgmJck0ucv3X [name=All-inclusive], reason=No county Skipping recicvTFyjV52Ulc6 [name=Calexico Wellness Center], reason=No latitude Skipping reck8nN8LE7qUu3WY [name=None], reason=No name Skipping rectdqnbtnzr54use [name=Vo Medical Center - Brawley], reason=No latitude Skipping recwHJmOChMc6TeA8 [name=Osage inglewood], reason=No county Skipping recy40U4HXu6hSmQ3 [name=South Medical], reason=No county Skipping reczTSHfKgH9MyS4m [name=5148 Market St], reason=No county
I fixed quite a few of the missing latitude ones just now via Airtable.
I am thrilled that we're fixing the data, but what's the rationale for refusing to allow records without a confirmed location?
One not-actually-contrived example: There are now we-come-to-you clinics in some places.
Aah I see. The regular table is currently a completely different schema - which is one of the reasons I want to keep the JSON around, just in case we missed something.
The new schema is best understood by looking at this model: https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/models.py#L256-L317
Note that this new schema is not at all set-in-stone - I expect we'll make quite a few changes to it before we ship.
I am thrilled that we're fixing the data, but what's the rationale for refusing to allow records without a confirmed location?
Purely that we decided that "location" would be a not null foreign key! I'm happy to change that if we think it's a good idea.
Oh hang on... I think I misread the proposed schema - I thought county was meant to be not null but it looks like null is allowed: https://github.com/CAVaccineInventory/data-engineering/blob/7912b47cfd41c7e1e9703f075acb2dbc073f4969/schema/definition.sql#L94
I don't think the nullability of stuff in that schema proposal was truly set in stone.
I'd like us to be able to capture any data we get from the outside world, even if it's low quality data. I would not mind at all if bad data resulted in data corrections or got flagged in a report or something.
Yeah I'm good with that. Our data will be messy and have holes in it. Better messy and recorded than lost.
The one constraint that definitely makes sense to me is that a CallRecord
MUST be associated with a Location
. Are there any cases where even that doesn't have holes in it?
That said, we apparently have 59 records with "No Location key in JSON object at all"
The new version of the import_airtable_records
script just finished, and it only skipped 122 rows (after importing 22822):
Skipping rec0hcqa5hEsLV2Nz, reason=No Location key in JSON object at all
Skipping rec0uR4BArGZPIqLF, reason=No Location key in JSON object at all
Skipping rec0yCiyqupiONBYZ, reason=Missing Availability
Skipping rec1RSy9RqILX1cly, reason=No Location key in JSON object at all
Skipping rec1r6ghPunvEZSs3, reason=Missing Availability
Skipping rec3CzozmDt7fUGLD, reason=No Location key in JSON object at all
Skipping rec3KTCAVjqTrEqBM, reason=No Location key in JSON object at all
Skipping rec3RYvQyGoGxQYn1, reason=Missing Availability
Skipping rec6R8xE6AYPNRzU6, reason=No Location key in JSON object at all
Skipping rec7rzF07F8qqmNJE, reason=Missing Availability
Skipping rec88mVQd515VCwtZ, reason=No Location key in JSON object at all
Skipping recAP5cqfUOLmh1te, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recAabdOFeVJHQx0p, reason=Missing Availability
Skipping recB7FDhCRVUqysD0, reason=No Location key in JSON object at all
Skipping recBE4x8UXT4MdMDm, reason=Missing Availability
Skipping recClKsfHrfOEkxmk, reason=No Location key in JSON object at all
Skipping recD8uYkIdkNOzkTI, reason=No Location key in JSON object at all
Skipping recDGr1ZT0dZB4EnL, reason=Missing Availability
Skipping recDQopTMb5OUP7wB, reason=No Location key in JSON object at all
Skipping recDqB3ifP9IKHd6o, reason=No Location key in JSON object at all
Skipping recDsL1W5XocKY0to, reason=No Location key in JSON object at all
Skipping recENZTAB8YT2CAI2, reason=Missing Availability
Skipping recH1CRynH0UhAHAk, reason=Missing Availability
Skipping recJ3c2su5wZ3JySl, reason=Missing Availability
Skipping recJFaMRbYd4Mpjsi, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recJRyOiq95WzUSvt, reason=Missing Availability
Skipping recJvja5frxVKqgLu, reason=Missing Availability
Skipping recKHmg0HjPB02J1W, reason=No Location key in JSON object at all
Skipping recKSd4iccicNZJac, reason=No Location key in JSON object at all
Skipping recKbKSXATHD6CT8a, reason=No Location key in JSON object at all
Skipping recLOyrI3fcmEmt25, reason=Missing Availability
Skipping recLpYPDPnfYcATiM, reason=No Location key in JSON object at all
Skipping recMfy5kFkGvkHvQu, reason=No Location key in JSON object at all
Skipping recN7ZM9ErdunRlbU, reason=No Location key in JSON object at all
Skipping recNE77Dmdbtg0PAM, reason=No Location key in JSON object at all
Skipping recNjSXDSDsR0XEQK, reason=Missing Availability
Skipping recNuzeYVG0Kvzlgn, reason=No Location key in JSON object at all
Skipping recOGf3tcHMYh7u3n, reason=Missing Availability
Skipping recOThSMD1hmSCXAV, reason=Missing Availability
Skipping recPRwsTZ6noy4RWw, reason=No Location key in JSON object at all
Skipping recPw6fulPhRbJBzu, reason=Missing Availability
Skipping recPzRYH0xi4tctl4, reason=No Location key in JSON object at all
Skipping recQkYQCihdQggfCO, reason=Missing Availability
Skipping recQxDCYn3JL9TztI, reason=Missing Availability
Skipping recRmgJvlU8y1zOmm, reason=Missing Availability
Skipping recS9ODITPYgw9MWj, reason=Missing Availability
Skipping recSonrMz83fydbP2, reason=Missing Availability
Skipping recSuNkkMRLVwdb6T, reason=Missing Availability
Skipping recTCHx8D72eFI7sg, reason=No Location key in JSON object at all
Skipping recTILrGVVIRyhxR7, reason=Missing Availability
Skipping recTIgUjGLPd0Sp8n, reason=Missing Availability
Skipping recTJpiNTSuFr8YyV, reason=No Location key in JSON object at all
Skipping recTdwKwJnjxDwFOV, reason=No Location key in JSON object at all
Skipping recTeco0yofGvoeq8, reason=No Location key in JSON object at all
Skipping recUAGNxQMI1OZRp4, reason=No Location key in JSON object at all
Skipping recUAgzvO0ZEzDaPM, reason=Missing Availability
Skipping recUECll34iOsro45, reason=Missing Availability
Skipping recUMYWNG0WZk7zf9, reason=No Location key in JSON object at all
Skipping recUoEk1LEJhANQBQ, reason=Missing Availability
Skipping recV7GQ7OukReWz4H, reason=Missing Availability
Skipping recVHS17ffjJwtoNA, reason=Missing Availability
Skipping recVknKa1EbvXCQiB, reason=No Location key in JSON object at all
Skipping recWnQWlhVFXqI2qG, reason=Missing Availability
Skipping recWyhmfKJN7BWB7e, reason=Missing Availability
Skipping recX9F5YSVMcMf92h, reason=Missing Availability
Skipping recXAzlLogGJWQwhM, reason=Missing Availability
Skipping recXjfSzvEsBJcFIu, reason=No Location key in JSON object at all
Skipping recZ5hcWCkOY6Vdc9, reason=Missing Availability
Skipping recZ5qDVcruYvm4PS, reason=Missing Availability
Skipping recaRQpZCvre6A2Ul, reason=Missing Availability
Skipping recb1XtHjKBoed9Eh, reason=No Location key in JSON object at all
Skipping recbYEeuryIfK6Hd4, reason=No Location key in JSON object at all
Skipping recbumTjNkk2sX7rO, reason=Missing Availability
Skipping reccQpTUGqPUQMUZO, reason=No Location key in JSON object at all
Skipping reccR96wDSoYNB9Mj, reason=Missing Availability
Skipping recfPEDfqmAzhcJjd, reason=Missing Availability
Skipping recg9cUnh0qDkIq9H, reason=No Location key in JSON object at all
Skipping recgalmEjhDJyP5EY, reason=Missing Availability
Skipping recgclMbiMxEac6mN, reason=No Location key in JSON object at all
Skipping recglq7oLNTngG55o, reason=Missing Availability
Skipping rechCEoeOBJaS3CYD, reason=Missing Availability
Skipping rechMg49UGyCdJLsQ, reason=No Location key in JSON object at all
Skipping rechNv2JagtdRqizK, reason=Missing Availability
Skipping rechTkKL22jrqLfSx, reason=No Location key in JSON object at all
Skipping reciUbWT5MD1bb9uH, reason=No Location key in JSON object at all
Skipping reciohaVFmSxXZBpL, reason=No Location key in JSON object at all
Skipping recjM7o747XZ5oLcM, reason=No Location key in JSON object at all
Skipping recjzbhE0HOoXPBCO, reason=Missing Availability
Skipping recl7hU0Mh2h9ZSRB, reason=No Location key in JSON object at all
Skipping reclCGmVSPqappP9p, reason=Missing Availability
Skipping reclTAW1LMaAqUNmN, reason=Missing Availability
Skipping reclp459SttiO758j, reason=No Location key in JSON object at all
Skipping reclqVVb8xItpGrgT, reason=No Location key in JSON object at all
Skipping recm9dRLXtQGsI25h, reason=Missing Availability
Skipping recmar0M59DuC4N9F, reason=No Location key in JSON object at all
Skipping recmpVKtbD6mQZvPB, reason=Missing Availability
Skipping recmqEo77ZB2BL0rs, reason=Missing Availability
Skipping recnJdewpF2BdnokP, reason=No Location key in JSON object at all
Skipping reco8oh46BqwosxBB, reason=Missing Availability
Skipping recoTkf2CUuVb1J9t, reason=No Location key in JSON object at all
Skipping recp1LzGKlxV1RKjk, reason=Missing Availability
Skipping recp2UMnjAQ0rkxJz, reason=Missing Availability
Skipping recp3r3wd67j2DyiZ, reason=No Location key in JSON object at all
Skipping recpFVakVQhk2nAAY, reason=No Location key in JSON object at all
Skipping recptkC9BTxNhCFb0, reason=Missing Availability
Skipping recqGf868zVrqxLVQ, reason=No Location key in JSON object at all
Skipping recqViddzkOxXRZCq, reason=Missing Availability
Skipping recqxWOGKUbiv9E4Y, reason=Missing Availability
Skipping recrVwmtkOQKyK5p9, reason=Missing Availability
Skipping rectK3Tcaye9FOnRi, reason=No Location key in JSON object at all
Skipping recuaJAfRtSGE7fze, reason=No Location key in JSON object at all
Skipping recv5uqammR5QGSMq, reason=No Location key in JSON object at all
Skipping recvqKh5brdEgGL8G, reason=Missing Availability
Skipping recvvtSuBtDVHIJzw, reason=Missing Availability
Skipping recwpQxVeOVyAmmT8, reason=No Location key in JSON object at all
Skipping recxpg9DFv7Ib5j00, reason=No Location key in JSON object at all
Skipping recyGNBIa3Eg1HGDR, reason=Missing Availability
Skipping recyp3x2WyNWZVu1t, reason=No Location key in JSON object at all
Skipping recyvdY0pav1gBnkP, reason=No Location key in JSON object at all
Skipping reczK9YUjuFe3DQ7p, reason=Missing Availability
Skipping reczXxdBqp0tRLHFd, reason=No Location key in JSON object at all
Skipping reczkpc7Rb1T1RGy5, reason=No Location key in JSON object at all
The 22822 records it imported can be browsed here: https://vaccinateca-preview.herokuapp.com/admin/core/callreport/
I bet those are reports on deleted locations. deleting locations is something we try not to do.
Reports must be about locations.
I'm going to fix the "Missing Availability" ones - they'll still get rows, those rows just won't have any availability tags at all.
Latest run only skipped 61 records, all because of missing locations:
Skipping rec0hcqa5hEsLV2Nz, reason=No Location key in JSON object at all
Skipping rec0uR4BArGZPIqLF, reason=No Location key in JSON object at all
Skipping rec1RSy9RqILX1cly, reason=No Location key in JSON object at all
Skipping rec3CzozmDt7fUGLD, reason=No Location key in JSON object at all
Skipping rec3KTCAVjqTrEqBM, reason=No Location key in JSON object at all
Skipping rec6R8xE6AYPNRzU6, reason=No Location key in JSON object at all
Skipping rec88mVQd515VCwtZ, reason=No Location key in JSON object at all
Skipping recAP5cqfUOLmh1te, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recB7FDhCRVUqysD0, reason=No Location key in JSON object at all
Skipping recClKsfHrfOEkxmk, reason=No Location key in JSON object at all
Skipping recD8uYkIdkNOzkTI, reason=No Location key in JSON object at all
Skipping recDQopTMb5OUP7wB, reason=No Location key in JSON object at all
Skipping recDqB3ifP9IKHd6o, reason=No Location key in JSON object at all
Skipping recDsL1W5XocKY0to, reason=No Location key in JSON object at all
Skipping recJFaMRbYd4Mpjsi, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recKHmg0HjPB02J1W, reason=No Location key in JSON object at all
Skipping recKSd4iccicNZJac, reason=No Location key in JSON object at all
Skipping recKbKSXATHD6CT8a, reason=No Location key in JSON object at all
Skipping recLpYPDPnfYcATiM, reason=No Location key in JSON object at all
Skipping recMfy5kFkGvkHvQu, reason=No Location key in JSON object at all
Skipping recN7ZM9ErdunRlbU, reason=No Location key in JSON object at all
Skipping recNE77Dmdbtg0PAM, reason=No Location key in JSON object at all
Skipping recNuzeYVG0Kvzlgn, reason=No Location key in JSON object at all
Skipping recPRwsTZ6noy4RWw, reason=No Location key in JSON object at all
Skipping recPzRYH0xi4tctl4, reason=No Location key in JSON object at all
Skipping recTCHx8D72eFI7sg, reason=No Location key in JSON object at all
Skipping recTJpiNTSuFr8YyV, reason=No Location key in JSON object at all
Skipping recTdwKwJnjxDwFOV, reason=No Location key in JSON object at all
Skipping recTeco0yofGvoeq8, reason=No Location key in JSON object at all
Skipping recUAGNxQMI1OZRp4, reason=No Location key in JSON object at all
Skipping recUMYWNG0WZk7zf9, reason=No Location key in JSON object at all
Skipping recVknKa1EbvXCQiB, reason=No Location key in JSON object at all
Skipping recXjfSzvEsBJcFIu, reason=No Location key in JSON object at all
Skipping recb1XtHjKBoed9Eh, reason=No Location key in JSON object at all
Skipping recbYEeuryIfK6Hd4, reason=No Location key in JSON object at all
Skipping reccQpTUGqPUQMUZO, reason=No Location key in JSON object at all
Skipping recg9cUnh0qDkIq9H, reason=No Location key in JSON object at all
Skipping recgclMbiMxEac6mN, reason=No Location key in JSON object at all
Skipping rechMg49UGyCdJLsQ, reason=No Location key in JSON object at all
Skipping rechTkKL22jrqLfSx, reason=No Location key in JSON object at all
Skipping reciUbWT5MD1bb9uH, reason=No Location key in JSON object at all
Skipping reciohaVFmSxXZBpL, reason=No Location key in JSON object at all
Skipping recjM7o747XZ5oLcM, reason=No Location key in JSON object at all
Skipping recl7hU0Mh2h9ZSRB, reason=No Location key in JSON object at all
Skipping reclp459SttiO758j, reason=No Location key in JSON object at all
Skipping reclqVVb8xItpGrgT, reason=No Location key in JSON object at all
Skipping recmar0M59DuC4N9F, reason=No Location key in JSON object at all
Skipping recnJdewpF2BdnokP, reason=No Location key in JSON object at all
Skipping recoTkf2CUuVb1J9t, reason=No Location key in JSON object at all
Skipping recp3r3wd67j2DyiZ, reason=No Location key in JSON object at all
Skipping recpFVakVQhk2nAAY, reason=No Location key in JSON object at all
Skipping recqGf868zVrqxLVQ, reason=No Location key in JSON object at all
Skipping rectK3Tcaye9FOnRi, reason=No Location key in JSON object at all
Skipping recuaJAfRtSGE7fze, reason=No Location key in JSON object at all
Skipping recv5uqammR5QGSMq, reason=No Location key in JSON object at all
Skipping recwpQxVeOVyAmmT8, reason=No Location key in JSON object at all
Skipping recxpg9DFv7Ib5j00, reason=No Location key in JSON object at all
Skipping recyp3x2WyNWZVu1t, reason=No Location key in JSON object at all
Skipping recyvdY0pav1gBnkP, reason=No Location key in JSON object at all
Skipping reczXxdBqp0tRLHFd, reason=No Location key in JSON object at all
Skipping reczkpc7Rb1T1RGy5, reason=No Location key in JSON object at all
No location record for location ID=['rec0xZ5EaKnnynfDa']
probably means that the Airtable backup that ran against the Reports ran shortly after the Locations one, and in that time a new Location was added.
I'm closing this ticket. Further work can take place in new, smaller tickets such as #20.
Split out from #9. I want to import every existing call record from Airtable, even the records that might be invalid (maybe they go to a separate table, or we relax the schema validation rules on our main table, or we invent values to make them valid).