XtremeCurling / nextbus2pg

Save nextbus real-time vehicle location data in a postgres database
MIT License
0 stars 0 forks source link

Ignoring vehicle locations with unknown dirTags #3

Open XtremeCurling opened 6 years ago

XtremeCurling commented 6 years ago

Currently, the code skips over vehicle locations that have an unknown dirTag (in NextBus jargon, which is equivalent to service.tag in this repo). I didn't consider that some dirTags might be missing from the routeConfig endpoint, but might nevertheless show up consistently in the vehicleLocations endpoint.

As it turns out, after examining nohup.out, this is the case for services on a few routes across the 6 agencies I've been archiving data for. A CSV is linked at the bottom of this description with the unknown dirTags this occurred for, and the number of times it happened on each one during the initial run (roughly mid-March until today, April 23). These counts were calculated by running sort nohup.out | uniq -c | sort -rn on the nohup.out file for each of the 6 agencies (really 7, since lametro is separate from lametro-rail in NextBus).

In order to fix this, I need to change the code so that when a vehicle location has an unknown dirTag, rather than ignoring that vehicle, it instead creates a new service for the unknown dirTag and the vehicle's route.

Summing the counts from the CSV shows that it has resulted in 29,711 ignored vehicle locations out of ~86 million total, which means it's affecting about 0.035% of location updates, or about 1 in 3000, across these 6 agencies. This has some effect and will bias analyses for the agencies most affected, but not so much as to render them useless. It could, however, be a much higher-incidence bugs for other untested agencies.

vehicle_missing_dirtag_counts.txt