Closed danrademacher closed 4 years ago
OK, "progress." Insanely, this SQL gets is the data we think we want, just for 2019, to test in this Google Sheet. The Sheet includes some minor QA around making sure the fractional data add up to the un-split totals for peds, cyclists, motorists killed or injured. At first, they did not, by as much as 10%, and that turned out to be because a lot of crashes have no value for any vehicle type, so I went back and lumped those in with _other
and then renamed those _other_unspecified
, and now everything lines up precisely.
Which is different from saying this is accurate to real life. I am pretty confident it is internally consistent, which is the first hurdle. Now, off to the client for her consideration.
If this survives scrutiny, I am sure we'll have to find ways to make the SQL more sane, since I wrote it! But it seemed impossible to specify outside of iteratively testing each part of the query until it came out about right. Now we can get the allocations Christine's been looking for all with SQL, preview in Sheets change it if we need it, then figure out if that leads to changes upstream.
Note that unlike other things I have been previewing in Sheets, this is not a live call to the CARTO API. This is simply too long to encode in a URL. Again, yet another reason the query will have to change, but still I think worth having this giant hairball of a query to show the draft results before we start trying overhauling ETL or the user facing applications.
-- final working query!
SELECT
--FOURTH Query aggregates into monthly buckets
year_month,
SUM(d.cyclist_injured) as cyclist_injured,
SUM(d.cyclist_killed) as cyclist_killed,
SUM(d.motorist_injured) as motorist_injured,
SUM(d.motorist_killed) as motorist_killed,
SUM(d.pedestrian_injured) as pedestrian_injured,
SUM(d.pedestrian_killed) as pedestrian_killed,
SUM(d.persons_injured) as persons_injured,
SUM(d.persons_killed) as persons_killed,
SUM(d.cyclist_injured_bybike) as cyclist_injured_bybike,
SUM(d.cyclist_killed_bybike) as cyclist_killed_bybike,
SUM(d.motorist_injured_bybike) as motorist_injured_bybike,
SUM(d.motorist_killed_bybike) as motorist_killed_bybike,
SUM(d.pedestrian_injured_bybike) as pedestrian_injured_bybike,
SUM(d.pedestrian_killed_bybike) as pedestrian_killed_bybike,
SUM(d.persons_injured_bybike) as persons_injured_bybike,
SUM(d.persons_killed_bybike) as persons_killed_bybike,
SUM(d.cyclist_injured_byscooter) as cyclist_injured_byscooter,
SUM(d.cyclist_killed_byscooter) as cyclist_killed_byscooter,
SUM(d.motorist_injured_byscooter) as motorist_injured_byscooter,
SUM(d.motorist_killed_byscooter) as motorist_killed_byscooter,
SUM(d.pedestrian_injured_byscooter) as pedestrian_injured_byscooter,
SUM(d.pedestrian_killed_byscooter) as pedestrian_killed_byscooter,
SUM(d.persons_injured_byscooter) as persons_injured_byscooter,
SUM(d.persons_killed_byscooter) as persons_killed_byscooter,
SUM(d.cyclist_injured_bymotorcycle) as cyclist_injured_bymotorcycle,
SUM(d.cyclist_killed_bymotorcycle) as cyclist_killed_bymotorcycle,
SUM(d.motorist_injured_bymotorcycle) as motorist_injured_bymotorcycle,
SUM(d.motorist_killed_bymotorcycle) as motorist_killed_bymotorcycle,
SUM(d.pedestrian_injured_bymotorcycle) as pedestrian_injured_bymotorcycle,
SUM(d.pedestrian_killed_bymotorcycle) as pedestrian_killed_bymotorcycle,
SUM(d.persons_injured_bymotorcycle) as persons_injured_bymotorcycle,
SUM(d.persons_killed_bymotorcycle) as persons_killed_bymotorcycle,
SUM(d.cyclist_injured_bybusvan) as cyclist_injured_bybusvan,
SUM(d.cyclist_killed_bybusvan) as cyclist_killed_bybusvan,
SUM(d.motorist_injured_bybusvan) as motorist_injured_bybusvan,
SUM(d.motorist_killed_bybusvan) as motorist_killed_bybusvan,
SUM(d.pedestrian_injured_bybusvan) as pedestrian_injured_bybusvan,
SUM(d.pedestrian_killed_bybusvan) as pedestrian_killed_bybusvan,
SUM(d.persons_injured_bybusvan) as persons_injured_bybusvan,
SUM(d.persons_killed_bybusvan) as persons_killed_bybusvan,
SUM(d.cyclist_injured_bycar) as cyclist_injured_bycar,
SUM(d.cyclist_killed_bycar) as cyclist_killed_bycar,
SUM(d.motorist_injured_bycar) as motorist_injured_bycar,
SUM(d.motorist_killed_bycar) as motorist_killed_bycar,
SUM(d.pedestrian_injured_bycar) as pedestrian_injured_bycar,
SUM(d.pedestrian_killed_bycar) as pedestrian_killed_bycar,
SUM(d.persons_injured_bycar) as persons_injured_bycar,
SUM(d.persons_killed_bycar) as persons_killed_bycar,
SUM(d.cyclist_injured_bysuv) as cyclist_injured_bysuv,
SUM(d.cyclist_killed_bysuv) as cyclist_killed_bysuv,
SUM(d.motorist_injured_bysuv) as motorist_injured_bysuv,
SUM(d.motorist_killed_bysuv) as motorist_killed_bysuv,
SUM(d.pedestrian_injured_bysuv) as pedestrian_injured_bysuv,
SUM(d.pedestrian_killed_bysuv) as pedestrian_killed_bysuv,
SUM(d.persons_injured_bysuv) as persons_injured_bysuv,
SUM(d.persons_killed_bysuv) as persons_killed_bysuv,
SUM(d.cyclist_injured_bytruck) as cyclist_injured_bytruck,
SUM(d.cyclist_killed_bytruck) as cyclist_killed_bytruck,
SUM(d.motorist_injured_bytruck) as motorist_injured_bytruck,
SUM(d.motorist_killed_bytruck) as motorist_killed_bytruck,
SUM(d.pedestrian_injured_bytruck) as pedestrian_injured_bytruck,
SUM(d.pedestrian_killed_bytruck) as pedestrian_killed_bytruck,
SUM(d.persons_injured_bytruck) as persons_injured_bytruck,
SUM(d.persons_killed_bytruck) as persons_killed_bytruck,
SUM(d.cyclist_injured_byother) as cyclist_injured_byother,
SUM(d.cyclist_killed_byother) as cyclist_killed_byother,
SUM(d.motorist_injured_byother) as motorist_injured_byother,
SUM(d.motorist_killed_byother) as motorist_killed_byother,
SUM(d.pedestrian_injured_byother) as pedestrian_injured_byother,
SUM(d.pedestrian_killed_byother) as pedestrian_killed_byother,
SUM(d.persons_injured_byother) as persons_injured_byother,
SUM(d.persons_killed_byother) as persons_killed_byother
FROM
--THIRD Query multiplies out all the blame allocation for each assigned vehicle type
(SELECT
year_month,
cyclist_injured,
cyclist_killed,
motorist_injured,
motorist_killed,
pedestrian_injured,
pedestrian_killed,
persons_injured,
persons_killed,
hasvehicle_bicycle,
hasvehicle_scooter,
hasvehicle_motorcycle,
hasvehicle_busvan,
hasvehicle_car,
hasvehicle_suv,
hasvehicle_truck,
hasvehicle_other_unspecified,
blame_factor,
bike_blame,
--now all the different fields, for every mode
--hasvehicle_bicycle
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bybike,
CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bybike,
--hasvehicle_scooter
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_byscooter,
CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_byscooter,
--hasvehicle_motorcycle
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bymotorcycle,
CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bymotorcycle,
--hasvehicle_busvan
CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bybusvan,
CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bybusvan,
--hasvehicle_car
CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bycar,
CASE WHEN (hasvehicle_car is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bycar,
--hasvehicle_suv
CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bysuv,
CASE WHEN (hasvehicle_suv is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bysuv,
--hasvehicle_truck
CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_bytruck,
CASE WHEN (hasvehicle_truck is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_bytruck,
--hasvehicle_other_unspecified
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_injured_allocated ELSE 0 END AS cyclist_injured_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_killed_allocated ELSE 0 END AS cyclist_killed_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_injured_allocated ELSE 0 END AS motorist_injured_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_killed_allocated ELSE 0 END AS motorist_killed_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_injured_allocated ELSE 0 END AS pedestrian_injured_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_killed_allocated ELSE 0 END AS pedestrian_killed_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_injured_allocated ELSE 0 END AS persons_injured_byother,
CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_killed_allocated ELSE 0 END AS persons_killed_byother
FROM
--SECOND QUERY pulls in a bunch of pass through but also multiples the allocation amounts
( SELECT
--these are totals, unallocated
cyclist_injured,
cyclist_killed,
motorist_injured,
motorist_killed,
pedestrian_injured,
pedestrian_killed,
persons_injured,
persons_killed,
hasvehicle_bicycle,
hasvehicle_scooter,
hasvehicle_motorcycle,
hasvehicle_busvan,
hasvehicle_car,
hasvehicle_suv,
hasvehicle_truck,
hasvehicle_other_unspecified,
blame_factor,
bike_blame,
--fractional for allocation
(blame_factor * cyclist_injured) as cyclist_injured_allocated,
(blame_factor * cyclist_killed) as cyclist_killed_allocated,
(blame_factor * motorist_injured) as motorist_injured_allocated,
(blame_factor * motorist_killed) as motorist_killed_allocated,
(blame_factor * pedestrian_injured) as pedestrian_injured_allocated,
(blame_factor * pedestrian_killed) as pedestrian_killed_allocated,
(blame_factor * persons_injured) as persons_injured_allocated,
(blame_factor * persons_killed) as persons_killed_allocated,
year_month
FROM
--BASE QUERY gets the core data and also sets a "blame factor" based on involved vehicle types
(SELECT
cartodb_id,
number_of_cyclist_injured as cyclist_injured,
number_of_cyclist_killed as cyclist_killed,
number_of_motorist_injured as motorist_injured,
number_of_motorist_killed as motorist_killed,
number_of_pedestrian_injured as pedestrian_injured,
number_of_pedestrian_killed as pedestrian_killed,
(number_of_pedestrian_injured + number_of_cyclist_injured + number_of_motorist_injured) as persons_injured,
(number_of_pedestrian_killed + number_of_cyclist_killed + number_of_motorist_killed) as persons_killed,
hasvehicle_bicycle,
hasvehicle_scooter,
hasvehicle_motorcycle,
hasvehicle_busvan,
hasvehicle_car,
hasvehicle_suv,
hasvehicle_truck,
--set other to TRUE if nothing else is selected, which catches crashes with no vtype data
CASE
WHEN
(hasvehicle_bicycle::int + hasvehicle_motorcycle::int + hasvehicle_scooter::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
TRUE
ELSE hasvehicle_other
END as hasvehicle_other_unspecified,
-- Determine if this record will "blame" bikes or scooters for injuries or deaths, only in cases with no other motor vehicles
CASE
WHEN
(hasvehicle_bicycle OR hasvehicle_scooter) AND (hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
TRUE
ELSE
FALSE
END as bike_blame,
--determine the number of blameable vehicles involved and then turn to percentage blame. Run as suqquery to it can be used for math
CASE
WHEN
(hasvehicle_bicycle OR hasvehicle_scooter) AND (hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
(1/CAST (NULLIF((hasvehicle_bicycle::int + hasvehicle_scooter::int),0) as FLOAT))
ELSE
(1/CAST (
NULLIF(
(hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int)
+
(CASE
WHEN (hasvehicle_bicycle::int + hasvehicle_motorcycle::int + hasvehicle_scooter::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
1
ELSE
0 END
),0) as FLOAT))
END
as blame_factor,
year || '-' || LPAD(month::text, 2, '0') as year_month
from crashes_all_prod as c
WHERE year = 2019
) as a
) as b) as d
GROUP BY year_month
One critique of this approach is that this is a TYPE apportionment, so we can imagine situations where, say, 4 cars and a truck cause 5 deaths. By the above, CAR gets hit with 2.5 and TRUCK the same. Is that right or should it be CAR 4
and TRUCK 1
?
Decided to see how often this is actually a problem. The query below will select all 2019 Crashes where bike_blame is false (so a motor vehicle was involved) and the number of Motor Vehicle Types is not equal to the raw number of vehicles.
Result: 686 crashes, out of 211,038 in 2019. 0.3% of crashes.
Ran this for the whole dataset as well, 7,638 crashes out of 1,744,626. Similar: 0.4% of crashes. That's less error than we're going to have just from weird data.
SELECT
count(cartodb_id)
FROM
(SELECT
cartodb_id,
count(distinct(vtype_unnest)) as vehiclecount,
typecount
FROM
(select * from
(SELECT
cartodb_id,
unnest(vehicle_type) as vtype_unnest,
(hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int) as typecount,
CASE
WHEN
(hasvehicle_bicycle OR hasvehicle_scooter) AND (hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
TRUE
ELSE
FALSE
END as bike_blame
FROM crashes_all_prod
where year = 2019) as a
-- get only crashes where bikes are not blamed
where bike_blame is false) as b
GROUP BY cartodb_id,typecount) as c
where
--Single vehicle crashes don't matter, and multivehicle crashes of the same type don't matter. So show me records with more than 1 vehicle, more than 1 type and a differential between raw vehicle number of type count
vehiclecount>1 and
typecount > 1 and
vehiclecount != typecount
New table to test some schema changes so we can then test how we might bake the above calcs into the database with repeatable queries: https://chekpeds.carto.com/tables/crashes_all_prod_vtype_tests
Then we can use the table for dynamic test of this allocation across longer time periods and different arrays of vehicle types
We need to create and populate these fields at the crash level using the logic already captured in the above nested query:
cyclist_injured_bybike
cyclist_killed_bybike
motorist_injured_bybike
motorist_killed_bybike
pedestrian_injured_bybike
pedestrian_killed_bybike
persons_injured_bybike
persons_killed_bybike
cyclist_injured_byscooter
cyclist_killed_byscooter
motorist_injured_byscooter
motorist_killed_byscooter
pedestrian_injured_byscooter
pedestrian_killed_byscooter
persons_injured_byscooter
persons_killed_byscooter
cyclist_injured_bymotorcycle
cyclist_killed_bymotorcycle
motorist_injured_bymotorcycle
motorist_killed_bymotorcycle
pedestrian_injured_bymotorcycle
pedestrian_killed_bymotorcycle
persons_injured_bymotorcycle
persons_killed_bymotorcycle
cyclist_injured_bybusvan
cyclist_killed_bybusvan
motorist_injured_bybusvan
motorist_killed_bybusvan
pedestrian_injured_bybusvan
pedestrian_killed_bybusvan
persons_injured_bybusvan
persons_killed_bybusvan
cyclist_injured_bycar
cyclist_killed_bycar
motorist_injured_bycar
motorist_killed_bycar
pedestrian_injured_bycar
pedestrian_killed_bycar
persons_injured_bycar
persons_killed_bycar
cyclist_injured_bysuv
cyclist_killed_bysuv
motorist_injured_bysuv
motorist_killed_bysuv
pedestrian_injured_bysuv
pedestrian_killed_bysuv
persons_injured_bysuv
persons_killed_bysuv
cyclist_injured_bytruck
cyclist_killed_bytruck
motorist_injured_bytruck
motorist_killed_bytruck
pedestrian_injured_bytruck
pedestrian_killed_bytruck
persons_injured_bytruck
persons_killed_bytruck
cyclist_injured_byother
cyclist_killed_byother
motorist_injured_byother
motorist_killed_byother
pedestrian_injured_byother
pedestrian_killed_byother
persons_injured_byother
Then we could run the monthly visualization summaries against that. Need to make sure other
is clearly other_unspecified
Ran this in CARTO GUI to add the fields
ALTER TABLE crashes_all_prod_vtype_tests
ADD COLUMN cyclist_injured_bybike decimal,
ADD COLUMN cyclist_killed_bybike decimal,
ADD COLUMN motorist_injured_bybike decimal,
ADD COLUMN motorist_killed_bybike decimal,
ADD COLUMN pedestrian_injured_bybike decimal,
ADD COLUMN pedestrian_killed_bybike decimal,
ADD COLUMN persons_injured_bybike decimal,
ADD COLUMN persons_killed_bybike decimal,
ADD COLUMN cyclist_injured_byscooter decimal,
ADD COLUMN cyclist_killed_byscooter decimal,
ADD COLUMN motorist_injured_byscooter decimal,
ADD COLUMN motorist_killed_byscooter decimal,
ADD COLUMN pedestrian_injured_byscooter decimal,
ADD COLUMN pedestrian_killed_byscooter decimal,
ADD COLUMN persons_injured_byscooter decimal,
ADD COLUMN persons_killed_byscooter decimal,
ADD COLUMN cyclist_injured_bymotorcycle decimal,
ADD COLUMN cyclist_killed_bymotorcycle decimal,
ADD COLUMN motorist_injured_bymotorcycle decimal,
ADD COLUMN motorist_killed_bymotorcycle decimal,
ADD COLUMN pedestrian_injured_bymotorcycle decimal,
ADD COLUMN pedestrian_killed_bymotorcycle decimal,
ADD COLUMN persons_injured_bymotorcycle decimal,
ADD COLUMN persons_killed_bymotorcycle decimal,
ADD COLUMN cyclist_injured_bybusvan decimal,
ADD COLUMN cyclist_killed_bybusvan decimal,
ADD COLUMN motorist_injured_bybusvan decimal,
ADD COLUMN motorist_killed_bybusvan decimal,
ADD COLUMN pedestrian_injured_bybusvan decimal,
ADD COLUMN pedestrian_killed_bybusvan decimal,
ADD COLUMN persons_injured_bybusvan decimal,
ADD COLUMN persons_killed_bybusvan decimal,
ADD COLUMN cyclist_injured_bycar decimal,
ADD COLUMN cyclist_killed_bycar decimal,
ADD COLUMN motorist_injured_bycar decimal,
ADD COLUMN motorist_killed_bycar decimal,
ADD COLUMN pedestrian_injured_bycar decimal,
ADD COLUMN pedestrian_killed_bycar decimal,
ADD COLUMN persons_injured_bycar decimal,
ADD COLUMN persons_killed_bycar decimal,
ADD COLUMN cyclist_injured_bysuv decimal,
ADD COLUMN cyclist_killed_bysuv decimal,
ADD COLUMN motorist_injured_bysuv decimal,
ADD COLUMN motorist_killed_bysuv decimal,
ADD COLUMN pedestrian_injured_bysuv decimal,
ADD COLUMN pedestrian_killed_bysuv decimal,
ADD COLUMN persons_injured_bysuv decimal,
ADD COLUMN persons_killed_bysuv decimal,
ADD COLUMN cyclist_injured_bytruck decimal,
ADD COLUMN cyclist_killed_bytruck decimal,
ADD COLUMN motorist_injured_bytruck decimal,
ADD COLUMN motorist_killed_bytruck decimal,
ADD COLUMN pedestrian_injured_bytruck decimal,
ADD COLUMN pedestrian_killed_bytruck decimal,
ADD COLUMN persons_injured_bytruck decimal,
ADD COLUMN persons_killed_bytruck decimal,
ADD COLUMN cyclist_injured_byother decimal,
ADD COLUMN cyclist_killed_byother decimal,
ADD COLUMN motorist_injured_byother decimal,
ADD COLUMN motorist_killed_byother decimal,
ADD COLUMN pedestrian_injured_byother decimal,
ADD COLUMN pedestrian_killed_byother decimal,
ADD COLUMN persons_injured_byother decimal,
ADD COLUMN persons_killed_byother decimal;
Next add blame_factor
and bike_blame
and also hasvehicle_other_unspecified
so we can catch unspecified allocations without constantly querying them out with conidtionals.
ALTER TABLE crashes_all_prod_vtype_tests
ADD COLUMN hasvehicle_other_unspecified boolean,
ADD COLUMN blame_factor decimal,
ADD COLUMN bike_blame boolean;
Setting their values for the whole dataset at once fails. So I did it for Jan 2019 through today:
UPDATE crashes_all_prod_vtype_tests
--set other to TRUE if nothing else is selected, which catches crashes with no vtype data
SET hasvehicle_other_unspecified =
CASE
WHEN
(hasvehicle_bicycle::int + hasvehicle_motorcycle::int + hasvehicle_scooter::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
TRUE
ELSE hasvehicle_other
END,
-- Determine if this record will "blame" bikes or scooters for injuries or deaths, only in cases with no other motor vehicles
bike_blame =
CASE
WHEN
(hasvehicle_bicycle OR hasvehicle_scooter) AND (hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
TRUE
ELSE
FALSE
END,
--determine the number of blameable vehicles involved and then turn to percentage blame. Run as suqquery to it can be used for math
blame_factor =
CASE
WHEN
(hasvehicle_bicycle OR hasvehicle_scooter) AND (hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
(1/CAST (NULLIF((hasvehicle_bicycle::int + hasvehicle_scooter::int),0) as FLOAT))
ELSE
(1/CAST (
NULLIF(
(hasvehicle_motorcycle::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int)
+
(CASE
WHEN (hasvehicle_bicycle::int + hasvehicle_motorcycle::int + hasvehicle_scooter::int + hasvehicle_busvan::int + hasvehicle_car::int + hasvehicle_suv::int + hasvehicle_truck::int + hasvehicle_other::int = 0)
THEN
1
ELSE
0 END
),0) as FLOAT))
END
WHERE
date_val between '2019-01-01 00:00:00'::timestamp and '2020-03-05 23:59:00'::timestamp;
We'd need to batch this to update the whole dataset
This should set all the allocations but getting a syntax error on SET
:
--This Query multiplies out all the blame allocation for each assigned vehicle type
UPDATE crashes_all_prod_vtype_tests
--now all the different fields, for every mode
SET
--hasvehicle_bicycle
cyclist_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_scooter
cyclist_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_motorcycle
cyclist_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_busvan
cyclist_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_car
cyclist_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_suv
cyclist_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_truck
cyclist_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_other_unspecified
cyclist_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_killed_allocated ELSE 0 END
From
(SELECT
--these are totals, unallocated
hasvehicle_bicycle,
hasvehicle_scooter,
hasvehicle_motorcycle,
hasvehicle_busvan,
hasvehicle_car,
hasvehicle_suv,
hasvehicle_truck,
hasvehicle_other_unspecified,
blame_factor,
bike_blame,
date_val,
--fractional for allocation
(blame_factor * number_of_cyclist_injured) as cyclist_injured_allocated,
(blame_factor * number_of_cyclist_killed) as cyclist_killed_allocated,
(blame_factor * number_of_motorist_injured) as motorist_injured_allocated,
(blame_factor * number_of_motorist_killed) as motorist_killed_allocated,
(blame_factor * number_of_pedestrian_injured) as pedestrian_injured_allocated,
(blame_factor * number_of_pedestrian_killed) as pedestrian_killed_allocated,
(blame_factor * (number_of_pedestrian_injured + number_of_cyclist_injured + number_of_motorist_injured) ) as persons_injured_allocated,
(blame_factor * (number_of_pedestrian_killed + number_of_cyclist_killed + number_of_motorist_killed)) as persons_killed_allocated,
--hasvehicle_bike
cyclist_injured_bybike,
cyclist_killed_bybike,
motorist_injured_bybike,
motorist_killed_bybike,
pedestrian_injured_bybike,
pedestrian_killed_bybike,
persons_injured_bybike,
persons_killed_bybike,
--hasvehicle_scooter
cyclist_injured_byscooter,
cyclist_killed_byscooter,
motorist_injured_byscooter,
motorist_killed_byscooter,
pedestrian_injured_byscooter,
pedestrian_killed_byscooter,
persons_injured_byscooter,
persons_killed_byscooter,
--hasvehicle_motorcycle
cyclist_injured_bymotorcycle,
cyclist_killed_bymotorcycle,
motorist_injured_bymotorcycle,
motorist_killed_bymotorcycle,
pedestrian_injured_bymotorcycle,
pedestrian_killed_bymotorcycle,
persons_injured_bymotorcycle,
persons_killed_bymotorcycle,
--hasvehicle_busvan
cyclist_injured_bybusvan,
cyclist_killed_bybusvan,
motorist_injured_bybusvan,
motorist_killed_bybusvan,
pedestrian_injured_bybusvan,
pedestrian_killed_bybusvan,
persons_injured_bybusvan,
persons_killed_bybusvan,
--hasvehicle_car
cyclist_injured_bycar,
cyclist_killed_bycar,
motorist_injured_bycar,
motorist_killed_bycar,
pedestrian_injured_bycar,
pedestrian_killed_bycar,
persons_injured_bycar,
persons_killed_bycar,
--hasvehicle_suv
cyclist_injured_bysuv,
cyclist_killed_bysuv,
motorist_injured_bysuv,
motorist_killed_bysuv,
pedestrian_injured_bysuv,
pedestrian_killed_bysuv,
persons_injured_bysuv,
persons_killed_bysuv,
--hasvehicle_truck
cyclist_injured_bytruck,
cyclist_killed_bytruck,
motorist_injured_bytruck,
motorist_killed_bytruck,
pedestrian_injured_bytruck,
pedestrian_killed_bytruck,
persons_injured_bytruck,
persons_killed_bytruck,
--hasvehicle_other_unspecified
cyclist_injured_byother,
cyclist_killed_byother,
motorist_injured_byother,
motorist_killed_byother,
pedestrian_injured_byother,
pedestrian_killed_byother,
persons_injured_byother,
persons_killed_byother
FROM
crashes_all_prod_vtype_tests
WHERE
date_val between '2019-01-01 00:00:00'::timestamp and '2020-03-05 23:59:00'::timestamp
) as a
OK, not sure why that's not working. For now, going to bake the subquery into the data. Not sure we'd want to do that on production.
ALTER TABLE crashes_all_prod_vtype_tests
ADD COLUMN cyclist_injured_allocated DECIMAL,
ADD COLUMN cyclist_killed_allocated DECIMAL,
ADD COLUMN motorist_injured_allocated DECIMAL,
ADD COLUMN motorist_killed_allocated DECIMAL,
ADD COLUMN pedestrian_injured_allocated DECIMAL,
ADD COLUMN pedestrian_killed_allocated DECIMAL,
ADD COLUMN persons_injured_allocated DECIMAL,
ADD COLUMN persons_killed_allocated DECIMAL;
and then populate those with the previous subquery
All records from Jan 2019 to now fails as over API limits:
UPDATE crashes_all_prod_vtype_tests
SET
cyclist_injured_allocated = (blame_factor * number_of_cyclist_injured),
cyclist_killed_allocated = (blame_factor * number_of_cyclist_killed),
motorist_injured_allocated = (blame_factor * number_of_motorist_injured),
motorist_killed_allocated = (blame_factor * number_of_motorist_killed),
pedestrian_injured_allocated = (blame_factor * number_of_pedestrian_injured),
pedestrian_killed_allocated = (blame_factor * number_of_pedestrian_killed),
persons_injured_allocated = (blame_factor * (number_of_pedestrian_injured + number_of_cyclist_injured + number_of_motorist_injured) ),
persons_killed_allocated = (blame_factor * (number_of_pedestrian_killed + number_of_cyclist_killed + number_of_motorist_killed))
WHERE
date_val between '2019-01-01 00:00:00'::timestamp and '2020-03-05 23:59:00'::timestamp
So for now, ran Jan-Mar 2020, then three 4 month increments to get the full time period. That worked.
date_val between '2020-01-01 00:00:00'::timestamp and '2020-03-05 23:59:00'::timestamp
date_val between '2019-09-01 00:00:00'::timestamp and '2019-12-31 23:59:00'::timestamp
date_val between '2019-05-01 00:00:00'::timestamp and '2019-8-31 23:59:00'::timestamp
date_val between '2019-01-01 00:00:00'::timestamp and '2019-4-30 23:59:00'::timestamp
Oy, and it looks like I could have done the subquery approach but many you can't put a comment line between the UPDATE
and SET
? Weird.
At any rate, this works without a comment there but fails with is
--This Query multiplies out all the blame allocation for each assigned vehicle type
UPDATE crashes_all_prod_vtype_tests
SET
--hasvehicle_bicycle
cyclist_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bybike = CASE WHEN (bike_blame is TRUE AND hasvehicle_bicycle is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_scooter
cyclist_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_byscooter = CASE WHEN (bike_blame is TRUE AND hasvehicle_scooter is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_motorcycle
cyclist_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bymotorcycle = CASE WHEN (hasvehicle_motorcycle is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_busvan
cyclist_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bybusvan = CASE WHEN (hasvehicle_busvan is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_car
cyclist_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bycar = CASE WHEN (hasvehicle_car is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_suv
cyclist_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bysuv = CASE WHEN (hasvehicle_suv is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_truck
cyclist_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_bytruck = CASE WHEN (hasvehicle_truck is TRUE) THEN persons_killed_allocated ELSE 0 END,
--hasvehicle_other_unspecified
cyclist_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_injured_allocated ELSE 0 END,
cyclist_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN cyclist_killed_allocated ELSE 0 END,
motorist_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_injured_allocated ELSE 0 END,
motorist_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN motorist_killed_allocated ELSE 0 END,
pedestrian_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_injured_allocated ELSE 0 END,
pedestrian_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN pedestrian_killed_allocated ELSE 0 END,
persons_injured_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_injured_allocated ELSE 0 END,
persons_killed_byother = CASE WHEN (hasvehicle_other_unspecified is TRUE) THEN persons_killed_allocated ELSE 0 END
WHERE
date_val between '2019-01-01 00:00:00'::timestamp and '2019-4-30 23:59:00'::timestamp
Ran for same time buckets.
And then our working summary query we can finally use to dynamically test per mode calcs and see if this stuff adds up:
SELECT
SUM(number_of_cyclist_injured) as cyclist_injured,
SUM(number_of_cyclist_killed) as cyclist_killed,
SUM(number_of_motorist_injured) as motorist_injured,
SUM(number_of_motorist_killed) as motorist_killed,
SUM(number_of_pedestrian_injured) as pedestrian_injured,
SUM(number_of_pedestrian_killed) as pedestrian_killed,
SUM((number_of_pedestrian_injured + number_of_cyclist_injured + number_of_motorist_injured)) as persons_injured,
SUM((number_of_pedestrian_killed + number_of_cyclist_killed + number_of_motorist_killed)) as persons_killed,
SUM(cyclist_injured_bybike) as cyclist_injured_bybike,
SUM(cyclist_killed_bybike) as cyclist_killed_bybike,
SUM(motorist_injured_bybike) as motorist_injured_bybike,
SUM(motorist_killed_bybike) as motorist_killed_bybike,
SUM(pedestrian_injured_bybike) as pedestrian_injured_bybike,
SUM(pedestrian_killed_bybike) as pedestrian_killed_bybike,
SUM(persons_injured_bybike) as persons_injured_bybike,
SUM(persons_killed_bybike) as persons_killed_bybike,
SUM(cyclist_injured_byscooter) as cyclist_injured_byscooter,
SUM(cyclist_killed_byscooter) as cyclist_killed_byscooter,
SUM(motorist_injured_byscooter) as motorist_injured_byscooter,
SUM(motorist_killed_byscooter) as motorist_killed_byscooter,
SUM(pedestrian_injured_byscooter) as pedestrian_injured_byscooter,
SUM(pedestrian_killed_byscooter) as pedestrian_killed_byscooter,
SUM(persons_injured_byscooter) as persons_injured_byscooter,
SUM(persons_killed_byscooter) as persons_killed_byscooter,
SUM(cyclist_injured_bymotorcycle) as cyclist_injured_bymotorcycle,
SUM(cyclist_killed_bymotorcycle) as cyclist_killed_bymotorcycle,
SUM(motorist_injured_bymotorcycle) as motorist_injured_bymotorcycle,
SUM(motorist_killed_bymotorcycle) as motorist_killed_bymotorcycle,
SUM(pedestrian_injured_bymotorcycle) as pedestrian_injured_bymotorcycle,
SUM(pedestrian_killed_bymotorcycle) as pedestrian_killed_bymotorcycle,
SUM(persons_injured_bymotorcycle) as persons_injured_bymotorcycle,
SUM(persons_killed_bymotorcycle) as persons_killed_bymotorcycle,
SUM(cyclist_injured_bybusvan) as cyclist_injured_bybusvan,
SUM(cyclist_killed_bybusvan) as cyclist_killed_bybusvan,
SUM(motorist_injured_bybusvan) as motorist_injured_bybusvan,
SUM(motorist_killed_bybusvan) as motorist_killed_bybusvan,
SUM(pedestrian_injured_bybusvan) as pedestrian_injured_bybusvan,
SUM(pedestrian_killed_bybusvan) as pedestrian_killed_bybusvan,
SUM(persons_injured_bybusvan) as persons_injured_bybusvan,
SUM(persons_killed_bybusvan) as persons_killed_bybusvan,
SUM(cyclist_injured_bycar) as cyclist_injured_bycar,
SUM(cyclist_killed_bycar) as cyclist_killed_bycar,
SUM(motorist_injured_bycar) as motorist_injured_bycar,
SUM(motorist_killed_bycar) as motorist_killed_bycar,
SUM(pedestrian_injured_bycar) as pedestrian_injured_bycar,
SUM(pedestrian_killed_bycar) as pedestrian_killed_bycar,
SUM(persons_injured_bycar) as persons_injured_bycar,
SUM(persons_killed_bycar) as persons_killed_bycar,
SUM(cyclist_injured_bysuv) as cyclist_injured_bysuv,
SUM(cyclist_killed_bysuv) as cyclist_killed_bysuv,
SUM(motorist_injured_bysuv) as motorist_injured_bysuv,
SUM(motorist_killed_bysuv) as motorist_killed_bysuv,
SUM(pedestrian_injured_bysuv) as pedestrian_injured_bysuv,
SUM(pedestrian_killed_bysuv) as pedestrian_killed_bysuv,
SUM(persons_injured_bysuv) as persons_injured_bysuv,
SUM(persons_killed_bysuv) as persons_killed_bysuv,
SUM(cyclist_injured_bytruck) as cyclist_injured_bytruck,
SUM(cyclist_killed_bytruck) as cyclist_killed_bytruck,
SUM(motorist_injured_bytruck) as motorist_injured_bytruck,
SUM(motorist_killed_bytruck) as motorist_killed_bytruck,
SUM(pedestrian_injured_bytruck) as pedestrian_injured_bytruck,
SUM(pedestrian_killed_bytruck) as pedestrian_killed_bytruck,
SUM(persons_injured_bytruck) as persons_injured_bytruck,
SUM(persons_killed_bytruck) as persons_killed_bytruck,
SUM(cyclist_injured_byother) as cyclist_injured_byother,
SUM(cyclist_killed_byother) as cyclist_killed_byother,
SUM(motorist_injured_byother) as motorist_injured_byother,
SUM(motorist_killed_byother) as motorist_killed_byother,
SUM(pedestrian_injured_byother) as pedestrian_injured_byother,
SUM(pedestrian_killed_byother) as pedestrian_killed_byother,
SUM(persons_injured_byother) as persons_injured_byother,
SUM(persons_killed_byother) as persons_killed_byother,
year || '-' || LPAD(month::text, 2, '0') as year_month
FROM crashes_all_prod_vtype_tests
Where year >=2019
GROUP BY year_month
Now live for testing in this gSheet: https://docs.google.com/spreadsheets/d/16ucfdJI0MANCSGCHuVaHzPlUMlFTsvolkz8aaHrrEFw/edit#gid=350030631
Even for a mode like busvan
that is uncommon compared to car
the single vtype query for hasvehicle_busvan
combined with these allocations produces results that make sense -- other vehicles are involved but busvan
is the most involved:
@gregallensworth after you tackle #25 for the overall batch queries, we're ready to start rolling in the Apportionment queries into the ETL as well. This issue should contain all the info you'll need, but very likely not in the format you will need it, since it is a sequence of everything I have done to get to the final queries. But I'm likely to botch the translation if I try to do it at the level of SQL.
This comment lays out the high level goal, then we can discuss live as needed:=. I think this'll take some brain time to absorb and then see if my approach needs to be modified.
One-time task: Add fields to hold the eventual apportioned data. I did that in the test table with the query in this comment,. https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/24#issuecomment-595573633
One one-time task and one dynamic task: Set whether to "blame" bicycles or not and also calculate the blame factor for all blamable vehicles. I did that in this comment, https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/24#issuecomment-595618232. Note that I was on the fence about whether to add blame_factor
and bike_blame
as permanent residents in the table vs a subquery. They are disposable once we have the fields made in step 1 populated, but I was concerned about query performance so it seemed like maybe baking them in was better. Might be different in the batch API.
Calculate the allocations. Again, in this comment https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/24#issuecomment-595625235, I added some permanent fields to hold the allocations and then used those to finally populate the fields made in step 1. I would defer to you on what's better when it comes to nested subqueries that are slower but don't clog the production database with fields we don't use the actual app, or extra columns in the DB to avoid subqueries.
Commit bdb3831
I have integrated the 3 UPDATE queries into the ETL script.
The three queries use "WHERE somefield IS NULL" clauses, meaning they will only update where the number is not yet calculated. This means that although today's running time for 2 million records was about 5 minutes for each of the 3 queries, future runs should take a few seconds.
One example query and output:
make_carto_sql_api_request("""
SELECT
MAX(persons_injured_bytruck) AS it,
MAX(persons_injured_bysuv) AS is,
MAX(persons_injured_bycar) AS ic,
MAX(persons_injured_bybusvan) AS ib,
MAX(persons_injured_byscooter) AS isc,
MAX(persons_injured_bybike) AS ibk,
MAX(persons_injured_bymotorcycle) AS imc
from crashes_all_prod
""")
[
{"it":12,"is":17,"ic":27,"ib":24,"isc":2,"ibk":11,"imc":5}
]
And a screenshot from CARTO's web UI :
Deployed.
Over in #22 we assigned booleans for the 8 canonical Vehicle Types.
Another part of that task, which is properly a separate issue, is this:
Portioning out the injured and the killed
22 is be sufficient to allow for basic filtering of crashes by involved vehicles -- "Show me crashes involving an SUV or CAR or TRUCK" like the push button Crash Type filters we have on the current application.
There's a further wrinkle in that we'll want to also know what proportion of injuries or fatalities were attributable to each vehicle type, but without double or triple-counting. If we simply add up all fatalities, say, where
CAR
andSUV
are involved, a crash of an SUV and car with a single fatality could create1+1
, when what we want to do is say that they were each equally responsible for a single fatality, so0.5+0.5
. With the further challenge that aBICYCLE
is the cause of a fatality only when no other vehicles are involved.So I think what we would need to do here is:
crashmapper_vehicletypes
involved>1
, check for bike, ifbike=true
, then subtract 1 from total vehicle typesBut then it gets even more gnarly, in that we also need to retain whether the injured or killed were walking, biking, or driving.
Though this makes for a large schema, seems like a clear option would be to create a set of 48 (!) fields that directly allocate to the 8 canonical types:
e-bike-scoot
boolean?e-bike-scoot_pedInjury
e-bike-scoot_bikeInjury
e-bike-scoot_motoristInjury
e-bike-scoot_pedFatality
e-bike-scoot_bikeFatality
e-bike-scoot_motoristFatality
... and so on for all 8
crashmapper_vehicletypes
Then we could always query the same fields for the same mode-related injury/death data, and rows without that data simply wouldn't contribute to the total.
Visualizing that data will be a challenge, but first we have to get the data (and I do have a plan for visualization within current application structure).