data-for-change / anyway

ANYWAY - Car accidents map
http://www.anyway.co.il
MIT License
75 stars 242 forks source link

[Bug] Improve location accuracy for newsflashes with Street resolution #2669

Open tkalir opened 3 weeks ago

tkalir commented 3 weeks ago

current flow for newsflashes with street resolution:

  1. textual analysis of the newsflash extracts location string and determines it is street resolution
  2. location string is sent to Google Maps API and we recieve cpordinates
  3. we search the db for accident with closest coordinates and use its location details for the newsflash

The new algorithm (under construction) currently searches the db for the 5 closest accidents with unique street names. These 5 street names are sent to GPT with the location string to find which of these streets are mentioned in the newsflash.

After running the algorithm for 50 newsflashes, 9 newsflashes had improved street value, 11 had issues: a. 2 had GPT returning answers not fitting the instructions b. 2 had their street in the 20 closest accidents, not in closest 5 c. 2 incorrectly identified as street resolution d. 2 needed a synonym: the newsflash mentioned דרך חיפה which in data.gov is a synonym for רחוב 1000. in our database this street is named 1000 e. 2 had streets that did not have accidents in the db already, so step 3 did not work.

issues a+b seem to be easily solvable. c is not in the scope of this specific issue, may be relevant for future issue, and we may want to mark these newsflashs somehow for future handling. for d, we need to determine how prevalent this issue is. talking with @atalyaalon we raised two ways to address this: 1 - storing the synonyms data from data gov in our database and fetching them when the basic algorithm does not work 2 - we can fetch the Google Maps data for the closest 5 accidents and use the CBS street of the accident that has the same Google Maps street as the newsflash option 2 may be easier to implement and simpler than using the synonym info, on the other hand the synonym info may do the actual confirmation that it is correct for the newsflash to say "דרך חיפה" and the cbs street to be "1000".

I will check the performance for option 2 and see how it compares with the GPT algorithm. I will also run these algorithms for more newsflashes and see which issues are more prevalent.