CAVaccineInventory / vial

The Django application powering calltheshots.us
https://vial.calltheshots.us
MIT License
13 stars 1 forks source link

Get import_airtable_records to import ALL records #15

Closed simonw closed 3 years ago

simonw commented 3 years ago

Split out from #9. I want to import every existing call record from Airtable, even the records that might be invalid (maybe they go to a separate table, or we relax the schema validation rules on our main table, or we invent values to make them valid).

simonw commented 3 years ago

There are useful comments on the old issue: https://github.com/CAVaccineInventory/django.vaccinate/issues/9#issuecomment-785544632

simonw commented 3 years ago

Note to self: revisit this notebook http://localhost:8888/notebooks/Extract%20objects%20from%20a%20column.ipynb

simonw commented 3 years ago

After extensive conversation with Nicholas Schiefer I have a much better idea of how to approach this now.

Notes from talking through the model:

Report source: Caller app or Data corrections. We have this implicitly right now - the idea is to capture the possible ways in which data arrives in this table. Standard one is from help.vaccinateca -

There are sites that have no phone number! LA County operate super-sites (Disney Land, Dodger Stadium) - there’s no point in calling Disney Land to ask about vaccines. You interface with that system exclusively through their website. So the ground truth genuinely is the website for those cases.

So a Data Correction is when someone manually files a report by looking at their website

It’s almost important to figure out from the historical data which of these it is. Probably impossible prior to the move to the help.vaccinate app - it added a Reported By column which lets you know if it’s an embedded Airtable app - but for those old records you can’t tell.

The help. transition means that every report is written by the same Airtable user - the one used by the Netlify functions. And there are NEW columns that were added at that point to record the Auth0 user.

Not all the things in the Airtable Reports table are actually caller reports. For historical reasons this wasn’t just the table that logs reports - it was also what the API published. So there are reports in there (which are identifiable) that are from ETL jobs.

simonw commented 3 years ago

Because there's so much weird historical context embedded in the Airtable records (and the chances of me interpreting them correctly the first time is slim) I'm going to take a new approach: a JSON column which I will populate with the entire record from Airtable.

This will ensure we keep all historic data and will let us run smarter backfills as our understanding grows.

Once we switch away from AIrtable the airtable_json column will stop being populated, which I think is fine.

simonw commented 3 years ago

My first opportunity to use the new JSONField from Django 3.1! https://docs.djangoproject.com/en/3.1/ref/models/fields/#jsonfield

simonw commented 3 years ago

I really like having the original/ Airtable JSON available to view in the admin - e.g. on https://vaccinateca-preview.herokuapp.com/admin/core/callreport/21192/change/

Change_call_report___Django_site_admin

Also neat: I added the ability to filter for just the records where airtable_json is not empty, which makes it easy to see just the imported (or not-imported) data: https://vaccinateca-preview.herokuapp.com/admin/core/callreport/?airtable_json__isempty=0

simonw commented 3 years ago

Well this is absurdly useful:

SELECT jsonb_object_keys(airtable_json) AS key, count(*) FROM call_report GROUP BY key;

Run that against PostgreSQL and I got this:

JSON key Times used
Affiliation (from Location) 21589
airtable_createdTime 21589
airtable_id 21589
Appointments by phone? 466
Appointment scheduling instructions 2526
auth0_reporter_id 1583
auth0_reporter_name 1583
auth0_reporter_roles 1583
Availability 21589
County (from Location) 21589
Do not call until 3116
external_reports_base_external_report_id 3139
Hour 21589
ID 21589
Internal Notes 18258
is_latest_report_for_location 21589
is_pending_review 18
Location 21589
location_id 21589
location_latest_eva_report_time 13443
location_latest_report_id 21589
location_latest_report_time 21354
Location Type (from Location) 21588
Name (from Location) 21589
Notes 3359
Number of Reports (from Location) 21589
parent_eva_report 958
parent_external_report 2410
Phone 82
Reported by 21589
report_id 21589
Report Type 21589
soft-dropped-column: Vaccines available? 15948
time 21589
tmp_eva_flips 21589
Vaccine demand 733
Vaccine demand notes 4
simonw commented 3 years ago

image

simonw commented 3 years ago

Next I'll fix the reported_by stuff. Here's a recent record:

  "Reported by": {
    "id": "usr3nFXwxJnpVjv4i",
    "name": "Help.vaccinateCA RW Role account for API token",
    "email": "jesse+role-rw-help-vaccinateca@vaccinateca.com"
  },
  ...
  "Internal Notes": "\n",
  "Do not call until": "2021-02-26T00:54:24.995Z",
  "auth0_reporter_id": "auth0|602ac647cea25d006a7cea17",
  "auth0_reporter_name": "Dave Kasten",

If those auth0_reporter_id fields are available it means the report was made by a user of the new help.vaccinate app, and those should take precence and the Reported by should be ignored (it's just the API token account).

If auth0_reporter_id is missing then the reporter must have been using the Airtable app and their Airtable credentials should be recorded.

simonw commented 3 years ago

I think the trickiest task left is to assign new "appointment tags" to the Django records: https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/import_utils.py#L103-L104

A reminder: the appointment tags we have look like this:

https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/migrations/0008_populate_appointment_tags.py#L3-L9

Those booleans are for the has_details field on the model.

They make more sense if you consider the calling app UI:

image

image

We're trying to model those selections using a combination of these five appointment_tags and the appointment_details field on the model - which is typed into by the caller using the interface shown above:

https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/models.py#L276-L286

The thing I haven't quite yet figured out is how to use the Airtable data to decide which tag was used.

simonw commented 3 years ago

@nschiefer @obra see above comment - I think you two may be best qualified to answer that?

simonw commented 3 years ago

I'm running the latest version of the reports import script now (after running the locations script first to reduce the chance of seeing location IDs in reports that haven't yet been created).

obra commented 3 years ago

re https://github.com/CAVaccineInventory/django.vaccinate/issues/15#issuecomment-786454438

https://github.com/CAVaccineInventory/help.vaccinate/blob/006d9a336bf07a5d1bac851bcbb406e4e4e14748/apps/assets/js/main.js#L307-L329

Yes, what is going on now is horrible and I'd love to make it better.

simonw commented 3 years ago

That's exactly what I needed to see!

      const apptMethod = document.querySelector("[name=appointmentMethod]:checked")?.value;
      switch (apptMethod) {
        case "phone":
          currentReport["Appointments by phone?"] = true;
          currentReport["Appointment scheduling instructions"] = document.querySelector("#appointmentPhone")?.value;
          break;
        case "county":
          currentReport["Appointment scheduling instructions"] = "Uses county scheduling system";
          break;
        case "myturn":
          currentReport["Appointment scheduling instructions"] = "https://myturn.ca.gov/";
          break;
        case "web":
          currentReport["Appointment scheduling instructions"] = document.querySelector("#appointmentWebsite")?.value;
          break;
        case "other":
          currentReport["Appointment scheduling instructions"] = document.querySelector(
            "#appointmentOtherInstructions"
          )?.value;
          break;
        default:
          break;
      }
obra commented 3 years ago

Well this is absurdly useful:

SELECT jsonb_object_keys(airtable_json) AS key, count(*) FROM call_report GROUP BY key;

Run that against PostgreSQL and I got this:

JSON key Times used Affiliation (from Location) 21589 airtable_createdTime 21589 airtable_id 21589 Appointments by phone? 466 Appointment scheduling instructions 2526 auth0_reporter_id 1583 auth0_reporter_name 1583 auth0_reporter_roles 1583 Availability 21589 County (from Location) 21589 Do not call until 3116 external_reports_base_external_report_id 3139 Hour 21589 ID 21589 Internal Notes 18258 is_latest_report_for_location 21589 is_pending_review 18 Location 21589 location_id 21589 location_latest_eva_report_time 13443 location_latest_report_id 21589 location_latest_report_time 21354 Location Type (from Location) 21588 Name (from Location) 21589 Notes 3359 Number of Reports (from Location) 21589 parent_eva_report 958 parent_external_report 2410 Phone 82 Reported by 21589 report_id 21589 Report Type 21589 soft-dropped-column: Vaccines available? 15948 time 21589 tmp_eva_flips 21589 Vaccine demand 733 Vaccine demand notes 4

The plan is to drop the columns that are the result of a join, right?

simonw commented 3 years ago

Actually no, I plan to leave the full Airtable JSON record in there forever more - stashed safely in the airtable_json column. No features will be allowed to be built against it, it will exist purely so we can occasionally marvel at it - and more importantly, so if in six months we say "Oh wow, we totally screwed up importing that data" we can run a SQL query to fix our records by re-interpreting the vestigial JSON in that column.

simonw commented 3 years ago

Oh I see what you mean - you're talking about things like "Location Type (from Location)" which are derived columns even at the point they showed up in the Airtable export.

I'd rather leave those in, even though they're absolutely useless information. It's easier for me to think "the JSON in the airtable_json column is the EXACT data we got out of their API" than worry about writing code that trims that down.

simonw commented 3 years ago

On the most recent run of import locations I got these skipped records:

Running vaccinate/manage.py import_airtable_locations --github-token=xxx on ⬢ vaccinateca-preview... up, run.6593 (Hobby)
Skipping rec0xZ5EaKnnynfDa [name=CVS Pharmacy® & Drug Store at 25272 Marguerite Pkwy, Mission Viejo, CA 92692], reason=No latitude
Skipping rec4KFYlaa485WMIX [name=Vo Medical Center - Calexico], reason=No latitude
Skipping rec7nHXCuSYRR61V0 [name=None], reason=No name
Skipping rec8Xk6kn4SvAKeEm [name=None], reason=No name
Skipping recCYZZRJCRlXykun [name=None], reason=No name
Skipping recDzR99nhYhVbvLA [name=Dr. Vincent V. Soun MD MPH MAJOR SARIN & TAO FAMILY MEDICAL CLINIC, INC.], reason=No latitude
Skipping recJ4iabS45l0rWB8 [name=Vo Medical Center - El Centro], reason=No latitude
Skipping recJt0iQbqmglF0XL [name=Dignity Health (Woodland)], reason=No county
Skipping recM2vjVCasanHStO [name=None], reason=No name
Skipping recMFogi5BI5krqKK [name=Walgreens Pharmacy #4559], reason=No county
Skipping recMSTzVRFGEuxS6M [name=CVS Pharmacy® & Drug Store at 638 Camino De Los Mares, San Clemente, CA 92673], reason=No latitude
Skipping recR9Wm4t02bIgKLR [name=Calexico Health Center], reason=No latitude
Skipping recTzWO20IAjnrcdS [name=PAMF Mountain View Center (Palo Alto Medical Foundation - Sutter)], reason=No latitude
Skipping recWqQc3bG7PiKQJi [name=Sharp Coronado Hospital], reason=No latitude
Skipping reccaeiuCMw6a7Dvg [name=None], reason=No name
Skipping recchGT7ticPxfr42 [name=None], reason=No name
Skipping recfGfKaDQtqAiSIx [name=None], reason=No name
Skipping recgudgmJck0ucv3X [name=All-inclusive], reason=No county
Skipping recicvTFyjV52Ulc6 [name=Calexico Wellness Center], reason=No latitude
Skipping reck8nN8LE7qUu3WY [name=None], reason=No name
Skipping rectdqnbtnzr54use [name=Vo Medical Center - Brawley], reason=No latitude
Skipping recwHJmOChMc6TeA8 [name=Osage inglewood], reason=No county
Skipping recy40U4HXu6hSmQ3 [name=South Medical], reason=No county
Skipping reczTSHfKgH9MyS4m [name=5148 Market St], reason=No county

I fixed quite a few of the missing latitude ones just now via Airtable.

obra commented 3 years ago

Oh I see what you mean - you're talking about things like "Location Type (from Location)" which are derived columns even at the point they showed up in the Airtable export.

I'd rather leave those in, even though they're absolutely useless information. It's easier for me to think "the JSON in the airtable_json column is the EXACT data we got out of their API" than worry about writing code that trims that down.

I don't mean in the json, but in the "regular" table

obra commented 3 years ago

On the most recent run of import locations I got these skipped records:

Running vaccinate/manage.py import_airtable_locations --github-token=xxx on ⬢ vaccinateca-preview... up, run.6593 (Hobby)
Skipping rec0xZ5EaKnnynfDa [name=CVS Pharmacy® & Drug Store at 25272 Marguerite Pkwy, Mission Viejo, CA 92692], reason=No latitude
Skipping rec4KFYlaa485WMIX [name=Vo Medical Center - Calexico], reason=No latitude
Skipping rec7nHXCuSYRR61V0 [name=None], reason=No name
Skipping rec8Xk6kn4SvAKeEm [name=None], reason=No name
Skipping recCYZZRJCRlXykun [name=None], reason=No name
Skipping recDzR99nhYhVbvLA [name=Dr. Vincent V. Soun MD MPH MAJOR SARIN & TAO FAMILY MEDICAL CLINIC, INC.], reason=No latitude
Skipping recJ4iabS45l0rWB8 [name=Vo Medical Center - El Centro], reason=No latitude
Skipping recJt0iQbqmglF0XL [name=Dignity Health (Woodland)], reason=No county
Skipping recM2vjVCasanHStO [name=None], reason=No name
Skipping recMFogi5BI5krqKK [name=Walgreens Pharmacy #4559], reason=No county
Skipping recMSTzVRFGEuxS6M [name=CVS Pharmacy® & Drug Store at 638 Camino De Los Mares, San Clemente, CA 92673], reason=No latitude
Skipping recR9Wm4t02bIgKLR [name=Calexico Health Center], reason=No latitude
Skipping recTzWO20IAjnrcdS [name=PAMF Mountain View Center (Palo Alto Medical Foundation - Sutter)], reason=No latitude
Skipping recWqQc3bG7PiKQJi [name=Sharp Coronado Hospital], reason=No latitude
Skipping reccaeiuCMw6a7Dvg [name=None], reason=No name
Skipping recchGT7ticPxfr42 [name=None], reason=No name
Skipping recfGfKaDQtqAiSIx [name=None], reason=No name
Skipping recgudgmJck0ucv3X [name=All-inclusive], reason=No county
Skipping recicvTFyjV52Ulc6 [name=Calexico Wellness Center], reason=No latitude
Skipping reck8nN8LE7qUu3WY [name=None], reason=No name
Skipping rectdqnbtnzr54use [name=Vo Medical Center - Brawley], reason=No latitude
Skipping recwHJmOChMc6TeA8 [name=Osage inglewood], reason=No county
Skipping recy40U4HXu6hSmQ3 [name=South Medical], reason=No county
Skipping reczTSHfKgH9MyS4m [name=5148 Market St], reason=No county

I fixed quite a few of the missing latitude ones just now via Airtable.

I am thrilled that we're fixing the data, but what's the rationale for refusing to allow records without a confirmed location?

One not-actually-contrived example: There are now we-come-to-you clinics in some places.

simonw commented 3 years ago

Aah I see. The regular table is currently a completely different schema - which is one of the reasons I want to keep the JSON around, just in case we missed something.

The new schema is best understood by looking at this model: https://github.com/CAVaccineInventory/django.vaccinate/blob/ffd2b77afbd993c1a13d5d248f31151d8e87fca8/vaccinate/core/models.py#L256-L317

simonw commented 3 years ago

Note that this new schema is not at all set-in-stone - I expect we'll make quite a few changes to it before we ship.

simonw commented 3 years ago

I am thrilled that we're fixing the data, but what's the rationale for refusing to allow records without a confirmed location?

Purely that we decided that "location" would be a not null foreign key! I'm happy to change that if we think it's a good idea.

simonw commented 3 years ago

Oh hang on... I think I misread the proposed schema - I thought county was meant to be not null but it looks like null is allowed: https://github.com/CAVaccineInventory/data-engineering/blob/7912b47cfd41c7e1e9703f075acb2dbc073f4969/schema/definition.sql#L94

obra commented 3 years ago

I don't think the nullability of stuff in that schema proposal was truly set in stone.

I'd like us to be able to capture any data we get from the outside world, even if it's low quality data. I would not mind at all if bad data resulted in data corrections or got flagged in a report or something.

simonw commented 3 years ago

Yeah I'm good with that. Our data will be messy and have holes in it. Better messy and recorded than lost.

The one constraint that definitely makes sense to me is that a CallRecord MUST be associated with a Location. Are there any cases where even that doesn't have holes in it?

simonw commented 3 years ago

That said, we apparently have 59 records with "No Location key in JSON object at all"

simonw commented 3 years ago

The new version of the import_airtable_records script just finished, and it only skipped 122 rows (after importing 22822):

Skipping rec0hcqa5hEsLV2Nz, reason=No Location key in JSON object at all
Skipping rec0uR4BArGZPIqLF, reason=No Location key in JSON object at all
Skipping rec0yCiyqupiONBYZ, reason=Missing Availability
Skipping rec1RSy9RqILX1cly, reason=No Location key in JSON object at all
Skipping rec1r6ghPunvEZSs3, reason=Missing Availability
Skipping rec3CzozmDt7fUGLD, reason=No Location key in JSON object at all
Skipping rec3KTCAVjqTrEqBM, reason=No Location key in JSON object at all
Skipping rec3RYvQyGoGxQYn1, reason=Missing Availability
Skipping rec6R8xE6AYPNRzU6, reason=No Location key in JSON object at all
Skipping rec7rzF07F8qqmNJE, reason=Missing Availability
Skipping rec88mVQd515VCwtZ, reason=No Location key in JSON object at all
Skipping recAP5cqfUOLmh1te, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recAabdOFeVJHQx0p, reason=Missing Availability
Skipping recB7FDhCRVUqysD0, reason=No Location key in JSON object at all
Skipping recBE4x8UXT4MdMDm, reason=Missing Availability
Skipping recClKsfHrfOEkxmk, reason=No Location key in JSON object at all
Skipping recD8uYkIdkNOzkTI, reason=No Location key in JSON object at all
Skipping recDGr1ZT0dZB4EnL, reason=Missing Availability
Skipping recDQopTMb5OUP7wB, reason=No Location key in JSON object at all
Skipping recDqB3ifP9IKHd6o, reason=No Location key in JSON object at all
Skipping recDsL1W5XocKY0to, reason=No Location key in JSON object at all
Skipping recENZTAB8YT2CAI2, reason=Missing Availability
Skipping recH1CRynH0UhAHAk, reason=Missing Availability
Skipping recJ3c2su5wZ3JySl, reason=Missing Availability
Skipping recJFaMRbYd4Mpjsi, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recJRyOiq95WzUSvt, reason=Missing Availability
Skipping recJvja5frxVKqgLu, reason=Missing Availability
Skipping recKHmg0HjPB02J1W, reason=No Location key in JSON object at all
Skipping recKSd4iccicNZJac, reason=No Location key in JSON object at all
Skipping recKbKSXATHD6CT8a, reason=No Location key in JSON object at all
Skipping recLOyrI3fcmEmt25, reason=Missing Availability
Skipping recLpYPDPnfYcATiM, reason=No Location key in JSON object at all
Skipping recMfy5kFkGvkHvQu, reason=No Location key in JSON object at all
Skipping recN7ZM9ErdunRlbU, reason=No Location key in JSON object at all
Skipping recNE77Dmdbtg0PAM, reason=No Location key in JSON object at all
Skipping recNjSXDSDsR0XEQK, reason=Missing Availability
Skipping recNuzeYVG0Kvzlgn, reason=No Location key in JSON object at all
Skipping recOGf3tcHMYh7u3n, reason=Missing Availability
Skipping recOThSMD1hmSCXAV, reason=Missing Availability
Skipping recPRwsTZ6noy4RWw, reason=No Location key in JSON object at all
Skipping recPw6fulPhRbJBzu, reason=Missing Availability
Skipping recPzRYH0xi4tctl4, reason=No Location key in JSON object at all
Skipping recQkYQCihdQggfCO, reason=Missing Availability
Skipping recQxDCYn3JL9TztI, reason=Missing Availability
Skipping recRmgJvlU8y1zOmm, reason=Missing Availability
Skipping recS9ODITPYgw9MWj, reason=Missing Availability
Skipping recSonrMz83fydbP2, reason=Missing Availability
Skipping recSuNkkMRLVwdb6T, reason=Missing Availability
Skipping recTCHx8D72eFI7sg, reason=No Location key in JSON object at all
Skipping recTILrGVVIRyhxR7, reason=Missing Availability
Skipping recTIgUjGLPd0Sp8n, reason=Missing Availability
Skipping recTJpiNTSuFr8YyV, reason=No Location key in JSON object at all
Skipping recTdwKwJnjxDwFOV, reason=No Location key in JSON object at all
Skipping recTeco0yofGvoeq8, reason=No Location key in JSON object at all
Skipping recUAGNxQMI1OZRp4, reason=No Location key in JSON object at all
Skipping recUAgzvO0ZEzDaPM, reason=Missing Availability
Skipping recUECll34iOsro45, reason=Missing Availability
Skipping recUMYWNG0WZk7zf9, reason=No Location key in JSON object at all
Skipping recUoEk1LEJhANQBQ, reason=Missing Availability
Skipping recV7GQ7OukReWz4H, reason=Missing Availability
Skipping recVHS17ffjJwtoNA, reason=Missing Availability
Skipping recVknKa1EbvXCQiB, reason=No Location key in JSON object at all
Skipping recWnQWlhVFXqI2qG, reason=Missing Availability
Skipping recWyhmfKJN7BWB7e, reason=Missing Availability
Skipping recX9F5YSVMcMf92h, reason=Missing Availability
Skipping recXAzlLogGJWQwhM, reason=Missing Availability
Skipping recXjfSzvEsBJcFIu, reason=No Location key in JSON object at all
Skipping recZ5hcWCkOY6Vdc9, reason=Missing Availability
Skipping recZ5qDVcruYvm4PS, reason=Missing Availability
Skipping recaRQpZCvre6A2Ul, reason=Missing Availability
Skipping recb1XtHjKBoed9Eh, reason=No Location key in JSON object at all
Skipping recbYEeuryIfK6Hd4, reason=No Location key in JSON object at all
Skipping recbumTjNkk2sX7rO, reason=Missing Availability
Skipping reccQpTUGqPUQMUZO, reason=No Location key in JSON object at all
Skipping reccR96wDSoYNB9Mj, reason=Missing Availability
Skipping recfPEDfqmAzhcJjd, reason=Missing Availability
Skipping recg9cUnh0qDkIq9H, reason=No Location key in JSON object at all
Skipping recgalmEjhDJyP5EY, reason=Missing Availability
Skipping recgclMbiMxEac6mN, reason=No Location key in JSON object at all
Skipping recglq7oLNTngG55o, reason=Missing Availability
Skipping rechCEoeOBJaS3CYD, reason=Missing Availability
Skipping rechMg49UGyCdJLsQ, reason=No Location key in JSON object at all
Skipping rechNv2JagtdRqizK, reason=Missing Availability
Skipping rechTkKL22jrqLfSx, reason=No Location key in JSON object at all
Skipping reciUbWT5MD1bb9uH, reason=No Location key in JSON object at all
Skipping reciohaVFmSxXZBpL, reason=No Location key in JSON object at all
Skipping recjM7o747XZ5oLcM, reason=No Location key in JSON object at all
Skipping recjzbhE0HOoXPBCO, reason=Missing Availability
Skipping recl7hU0Mh2h9ZSRB, reason=No Location key in JSON object at all
Skipping reclCGmVSPqappP9p, reason=Missing Availability
Skipping reclTAW1LMaAqUNmN, reason=Missing Availability
Skipping reclp459SttiO758j, reason=No Location key in JSON object at all
Skipping reclqVVb8xItpGrgT, reason=No Location key in JSON object at all
Skipping recm9dRLXtQGsI25h, reason=Missing Availability
Skipping recmar0M59DuC4N9F, reason=No Location key in JSON object at all
Skipping recmpVKtbD6mQZvPB, reason=Missing Availability
Skipping recmqEo77ZB2BL0rs, reason=Missing Availability
Skipping recnJdewpF2BdnokP, reason=No Location key in JSON object at all
Skipping reco8oh46BqwosxBB, reason=Missing Availability
Skipping recoTkf2CUuVb1J9t, reason=No Location key in JSON object at all
Skipping recp1LzGKlxV1RKjk, reason=Missing Availability
Skipping recp2UMnjAQ0rkxJz, reason=Missing Availability
Skipping recp3r3wd67j2DyiZ, reason=No Location key in JSON object at all
Skipping recpFVakVQhk2nAAY, reason=No Location key in JSON object at all
Skipping recptkC9BTxNhCFb0, reason=Missing Availability
Skipping recqGf868zVrqxLVQ, reason=No Location key in JSON object at all
Skipping recqViddzkOxXRZCq, reason=Missing Availability
Skipping recqxWOGKUbiv9E4Y, reason=Missing Availability
Skipping recrVwmtkOQKyK5p9, reason=Missing Availability
Skipping rectK3Tcaye9FOnRi, reason=No Location key in JSON object at all
Skipping recuaJAfRtSGE7fze, reason=No Location key in JSON object at all
Skipping recv5uqammR5QGSMq, reason=No Location key in JSON object at all
Skipping recvqKh5brdEgGL8G, reason=Missing Availability
Skipping recvvtSuBtDVHIJzw, reason=Missing Availability
Skipping recwpQxVeOVyAmmT8, reason=No Location key in JSON object at all
Skipping recxpg9DFv7Ib5j00, reason=No Location key in JSON object at all
Skipping recyGNBIa3Eg1HGDR, reason=Missing Availability
Skipping recyp3x2WyNWZVu1t, reason=No Location key in JSON object at all
Skipping recyvdY0pav1gBnkP, reason=No Location key in JSON object at all
Skipping reczK9YUjuFe3DQ7p, reason=Missing Availability
Skipping reczXxdBqp0tRLHFd, reason=No Location key in JSON object at all
Skipping reczkpc7Rb1T1RGy5, reason=No Location key in JSON object at all

The 22822 records it imported can be browsed here: https://vaccinateca-preview.herokuapp.com/admin/core/callreport/

obra commented 3 years ago

I bet those are reports on deleted locations. deleting locations is something we try not to do.

Reports must be about locations.

simonw commented 3 years ago

I'm going to fix the "Missing Availability" ones - they'll still get rows, those rows just won't have any availability tags at all.

simonw commented 3 years ago

Latest run only skipped 61 records, all because of missing locations:

Skipping rec0hcqa5hEsLV2Nz, reason=No Location key in JSON object at all
Skipping rec0uR4BArGZPIqLF, reason=No Location key in JSON object at all
Skipping rec1RSy9RqILX1cly, reason=No Location key in JSON object at all
Skipping rec3CzozmDt7fUGLD, reason=No Location key in JSON object at all
Skipping rec3KTCAVjqTrEqBM, reason=No Location key in JSON object at all
Skipping rec6R8xE6AYPNRzU6, reason=No Location key in JSON object at all
Skipping rec88mVQd515VCwtZ, reason=No Location key in JSON object at all
Skipping recAP5cqfUOLmh1te, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recB7FDhCRVUqysD0, reason=No Location key in JSON object at all
Skipping recClKsfHrfOEkxmk, reason=No Location key in JSON object at all
Skipping recD8uYkIdkNOzkTI, reason=No Location key in JSON object at all
Skipping recDQopTMb5OUP7wB, reason=No Location key in JSON object at all
Skipping recDqB3ifP9IKHd6o, reason=No Location key in JSON object at all
Skipping recDsL1W5XocKY0to, reason=No Location key in JSON object at all
Skipping recJFaMRbYd4Mpjsi, reason=No location record for location ID=['rec0xZ5EaKnnynfDa']
Skipping recKHmg0HjPB02J1W, reason=No Location key in JSON object at all
Skipping recKSd4iccicNZJac, reason=No Location key in JSON object at all
Skipping recKbKSXATHD6CT8a, reason=No Location key in JSON object at all
Skipping recLpYPDPnfYcATiM, reason=No Location key in JSON object at all
Skipping recMfy5kFkGvkHvQu, reason=No Location key in JSON object at all
Skipping recN7ZM9ErdunRlbU, reason=No Location key in JSON object at all
Skipping recNE77Dmdbtg0PAM, reason=No Location key in JSON object at all
Skipping recNuzeYVG0Kvzlgn, reason=No Location key in JSON object at all
Skipping recPRwsTZ6noy4RWw, reason=No Location key in JSON object at all
Skipping recPzRYH0xi4tctl4, reason=No Location key in JSON object at all
Skipping recTCHx8D72eFI7sg, reason=No Location key in JSON object at all
Skipping recTJpiNTSuFr8YyV, reason=No Location key in JSON object at all
Skipping recTdwKwJnjxDwFOV, reason=No Location key in JSON object at all
Skipping recTeco0yofGvoeq8, reason=No Location key in JSON object at all
Skipping recUAGNxQMI1OZRp4, reason=No Location key in JSON object at all
Skipping recUMYWNG0WZk7zf9, reason=No Location key in JSON object at all
Skipping recVknKa1EbvXCQiB, reason=No Location key in JSON object at all
Skipping recXjfSzvEsBJcFIu, reason=No Location key in JSON object at all
Skipping recb1XtHjKBoed9Eh, reason=No Location key in JSON object at all
Skipping recbYEeuryIfK6Hd4, reason=No Location key in JSON object at all
Skipping reccQpTUGqPUQMUZO, reason=No Location key in JSON object at all
Skipping recg9cUnh0qDkIq9H, reason=No Location key in JSON object at all
Skipping recgclMbiMxEac6mN, reason=No Location key in JSON object at all
Skipping rechMg49UGyCdJLsQ, reason=No Location key in JSON object at all
Skipping rechTkKL22jrqLfSx, reason=No Location key in JSON object at all
Skipping reciUbWT5MD1bb9uH, reason=No Location key in JSON object at all
Skipping reciohaVFmSxXZBpL, reason=No Location key in JSON object at all
Skipping recjM7o747XZ5oLcM, reason=No Location key in JSON object at all
Skipping recl7hU0Mh2h9ZSRB, reason=No Location key in JSON object at all
Skipping reclp459SttiO758j, reason=No Location key in JSON object at all
Skipping reclqVVb8xItpGrgT, reason=No Location key in JSON object at all
Skipping recmar0M59DuC4N9F, reason=No Location key in JSON object at all
Skipping recnJdewpF2BdnokP, reason=No Location key in JSON object at all
Skipping recoTkf2CUuVb1J9t, reason=No Location key in JSON object at all
Skipping recp3r3wd67j2DyiZ, reason=No Location key in JSON object at all
Skipping recpFVakVQhk2nAAY, reason=No Location key in JSON object at all
Skipping recqGf868zVrqxLVQ, reason=No Location key in JSON object at all
Skipping rectK3Tcaye9FOnRi, reason=No Location key in JSON object at all
Skipping recuaJAfRtSGE7fze, reason=No Location key in JSON object at all
Skipping recv5uqammR5QGSMq, reason=No Location key in JSON object at all
Skipping recwpQxVeOVyAmmT8, reason=No Location key in JSON object at all
Skipping recxpg9DFv7Ib5j00, reason=No Location key in JSON object at all
Skipping recyp3x2WyNWZVu1t, reason=No Location key in JSON object at all
Skipping recyvdY0pav1gBnkP, reason=No Location key in JSON object at all
Skipping reczXxdBqp0tRLHFd, reason=No Location key in JSON object at all
Skipping reczkpc7Rb1T1RGy5, reason=No Location key in JSON object at all

No location record for location ID=['rec0xZ5EaKnnynfDa'] probably means that the Airtable backup that ran against the Reports ran shortly after the Locations one, and in that time a new Location was added.

simonw commented 3 years ago

I'm closing this ticket. Further work can take place in new, smaller tickets such as #20.