cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
18 stars 2 forks source link

[Research] Investigate and document how currently available incident data can be linked #19839

Open johnclary opened 1 week ago

johnclary commented 1 week ago

Investigate how we can link our currently available data to create a more complete dataset of crashes and injuries. Those incident types are:

See also: this report from Xavier: APD-EMS Analysis-20220412 1.pdf

johnclary commented 1 week ago

I reached out to Lynn C @ EMS (cc Xavier) on teams:

Hi Lynn! I work with Xavier on our Vision Zero crash database, and am hoping you can help us track down some more information about the incident data we're working with. Do you know if there may be a reliable way to tie an APD incident number and an EMS incident number back to the same dispatch event? I see that the APD incident numbers and EMS incident numbers share a 5-digit prefix in common. I'm hoping to learn more about what system or database this 5-digit number comes from, and if there be another common ID (possibly in the CAD system) that would help us definitely link the APD response and EMS response data.

Screenshot 2024-11-08 at 1 01 50 PM
johnclary commented 1 week ago

Here are queries i've been using to join EMS incidents to CR3 and non-CR3 crashes.

Join crashes to EMS incidents based on case ID ```sql -- crashes to ems based on case ID / incident number SELECT ems.id, crash.cris_crash_id, incident_location_address, incident_problem, pcr_cause_of_injury, pcr_patient_complaints, pcr_provider_impression_primary, pcr_provider_impression_secondary, pcr_patient_acuity_level, pcr_patient_acuity_final, pcr_provider_impression_primary, pcr_transport_priority, apd_incident_number_1, apd_incident_number_2, apd_incident_numbers, st_setsrid(ST_MakePoint(incident_location_latitude, incident_location_latitude), 4326) as position, (incident_date_received || ' ' || incident_time_received || ' US/Central')::timestamptz AS incident_timestamp, crash.crash_timestamp FROM ems__incidents ems LEFT JOIN crashes crash ON crash.case_id = ems.apd_incident_number_1::text OR crash.case_id = ems.apd_incident_number_2::text WHERE ems.incident_date_received >= '2023-02-02' AND ems.incident_date_received < '2023-02-03'; ```
Join crashes to EMS incidents based on timestamp and location ```sql -- crashes to ems based on time and location with crashes as (select * from crashes where crash_timestamp >= '2023-02-02' AND crash_timestamp < '2023-02-03' and is_deleted = false ), ems AS ( SELECT ems.id ems_id, incident_location_address, incident_problem, pcr_cause_of_injury, pcr_patient_complaints, pcr_provider_impression_primary, pcr_provider_impression_secondary, pcr_patient_acuity_level, pcr_patient_acuity_final, pcr_provider_impression_primary, pcr_transport_priority, apd_incident_number_1, apd_incident_number_2, apd_incident_numbers, st_setsrid (ST_MakePoint (incident_location_longitude, incident_location_latitude), 4326) AS position, (incident_date_received || ' ' || incident_time_received || ' US/Central')::timestamptz AS incident_timestamp FROM ems__incidents ems WHERE ems.incident_date_received >= '2023-02-02' AND ems.incident_date_received < '2023-02-03' AND ems.apd_incident_number_1 IS NULL ) SELECT crash.cris_crash_id, crash.crash_timestamp, ems.incident_timestamp, crash.address_primary, crash.address_secondary, ems.incident_location_address FROM ems JOIN crashes crash ON -- Timestamp condition: within 4 hours ABS(EXTRACT(EPOCH FROM (ems.incident_timestamp - crash.crash_timestamp))) <= 4 * 3600 AND -- Distance condition: within 1000 meters ST_DistanceSphere( ems.position, crash.position ) <= 1000; ```
Join non-cr3 (blueform) crashes to EMS incidents based on timestamp and location ```sql -- noncr3 crashes to ems based on time and location WITH noncr3crashes AS ( SELECT *, (date || ' ' || hour || ':00:00' || 'US/Central')::timestamptz AS crash_timestamp FROM atd_apd_blueform WHERE (date || ' ' || hour || ':00:00' || 'US/Central')::timestamptz >= '2023-02-02' AND(date || ' ' || hour || ':00:00' || 'US/Central')::timestamptz < '2023-02-03' ), ems AS ( SELECT ems.id ems_id, incident_location_address, incident_problem, pcr_cause_of_injury, pcr_patient_complaints, pcr_provider_impression_primary, pcr_provider_impression_secondary, pcr_patient_acuity_level, pcr_patient_acuity_final, pcr_provider_impression_primary, pcr_transport_priority, apd_incident_number_1, apd_incident_number_2, apd_incident_numbers, st_setsrid (ST_MakePoint (incident_location_longitude, incident_location_latitude), 4326) AS position, (incident_date_received || ' ' || incident_time_received || ' US/Central')::timestamptz AS incident_timestamp FROM ems__incidents ems WHERE ems.incident_date_received >= '2023-02-02' AND ems.incident_date_received < '2023-02-03' AND ems.apd_incident_number_1 IS NULL ) SELECT crash.form_id, crash.crash_timestamp, ems.incident_timestamp, crash.address, ems.incident_location_address FROM ems JOIN noncr3crashes crash ON -- timestamp condition: within 4 hours ABS(EXTRACT(EPOCH FROM (ems.incident_timestamp - crash.crash_timestamp))) <= 4 * 3600 AND -- Distance condition: within 1000 meters ST_DistanceSphere (ems.position, crash.position) <= 1000; ```
johnclary commented 3 days ago

TLDR: Lynn has offered to rework the EMS incident query so that the apd_incident_numbers column is more consistently populated.

Convo with Lynn in response to above question

Lyn:

Hi John, I belive the 5 digit prefix is the julian date. The linking of the incidents between agencies is kind of complicated in the CAD system, because there isn't a single identifier for a single event. Instead the incidents are related to each other using a kind of "bridge" table. In the dataset that we send, there should be a colum for "apd incident numbers" that will show any linked apd incidents to the ems incident

John:

yes, we do have the APD incident numbers in the data you send, however they are very often blank. when the APD incident numbers are blank, does that indicate that there was no APD information to join from the bridge table?

Lynn:

That’s correct, when it’s blank that means there wasn’t a match in the bridge table.

John:

i see. can you tell me a bit more about this bridge table? i assume it's holding some kind of many-to-many relationship with call numbers?

Lynn:

yep that's correct. it get's pretty messy. so basically there is one incident table "Response_Master_Incident" that has all the incident data for all the agencies. The agencies are separated using a "agency_type" column. The bridge table is "MultiAgencyIncident" and has 3 incident id columns that join back to Response_Master_Incident. The way they are labeled would suggest a kind of parent/child or incident "group" kind of scenario but that isn't necessarily the case.
so for example colum "ParentIncidentId" would join to a record in Response_Master_Incident, and that might be an APD record, then column "MasterIncidentId" would join to a record in Response_Master_Incident and that might be an EMS record. Then that sime incident ID in the MasterIncidentId column could appear as the ParentIncidentId in another record with a link to another incident that might be an AFD record. then that one might appear in another row as the IncidentGroupId and link to another EMS or APD record

a long time ago, i tried to resolve all the related records, and it was crazy. i found there were a few incidents that had 20-30 other linked incidents because APD was linking all incidents that had a specific address.

John:

so there can be multiple levels of nesting and joining, plus some data quality issues as well...

do you have any ideas for what we might be able to do to get a better hit rate on these? do you think there might be an exactly matching timestamp between some of the records, or an address? or maybe going one level deeper on the joins?

Lynn:

hmm yes it's possible that we could look at the address and general time - i don't think the time stamps for call received match up anymore though. but it's possible I can look at address, general time frame, possibly the call taker name... and I can also look at going more levels in with MultiAgency incident.. I think right now it's just looking at matches on either ParentIncidentId or MasterIncidentId but I should be able to also incorporate incident group Id

John:

i think we have enough of the data on our side to work on the fuzzy matching with address and time—i don't want to ask you take on any extra work for us. but if you think it might be worthwhile to add to the joins and group ID, that would be really great.

Lynn:

for sure. i had set up a table to track linked apd and afd incident numbers, and i think it was specifically for this project, but it's been very helpful elsewhere, so definitely worthwhile to look at additional joins/levels

johnclary commented 3 days ago

I'm going to follow-up with Lynn at the end of this week to see if they've had a chance to revisit the incident query.