Closed atalyaalon closed 3 months ago
Hi @atalyaalon , Regarding cities appearing in accidents without city name that appear in new cities table:
cbs_cities
has more entries (1482 rows) then current CBS data (1273 cities, same number of cities in streets and cities tables).static/data/cities.csv
for the name according to the symbol. I will change it to search in the cities DB table instead.cbs_cities
table, overriding the data from static/data/cities.csv
file.main.py create-tables update-cities-table
appears to be unused. Should I remove it?Hi @atalyaalon , Regarding cities appearing in accidents without city name that appear in new cities table:
Cities appearing in accidents without city name that appear in new cities table:
- 3400 חברון
- 3815 תל ציון
- 2826 שער שומרון These accidents will get the city names when reloaded with this fix completed.
Great!
- The other entries without city names are of codes that does not appear in current CBS cities data.
cbs_cities
has more entries (1482 rows) then current CBS data (1273 cities, same number of cities in streets and cities tables).
OK, that's what we've got. CBS approved we can use that table so that's ok.
- When reading accidents we search in
static/data/cities.csv
for the name according to the symbol. I will change it to search in the cities DB table instead.
OK sounds good
- The data from CBS site will be populated to the current
cbs_cities
table, overriding the data fromstatic/data/cities.csv
file.
OK that's what we want
- The cli command
main.py create-tables update-cities-table
appears to be unused. Should I remove it? We can remove if it's not used anywhere (unless you want to use this command and adjust it to your needs in Airflow)
In addition, I suggest addressing the the following issue: Background: AccidentMarker table ("markers") is used in ANYWAY map nowadays, and data of streets names and yishuves names is inserted using current CBS data (meaning if street name is changing we won't see it there) I think that's fine by us and we don't want to change the way data is inserted into this table nowadays.
However, we do want to update tables we use in anyway Media API, markers_hebrew
involved_markers_hebrew
in db_views.py
file I suggest that tables markers_hebrew
involved_markers_hebrew
will consist of an additional join of street names fields (street1_hebrew, street2_hebrew) and instead of taking them from AccidentMarker table and having data that's not updated, I suggest joining them with streets table and cities table to get the most updated names. (It's SQL code, I can assist and review if needed, but from what I can see I assume these are 2 simple joins, just make sure the symbols of streets and cities are indexed in both tables)
Hi @atalyaalon , It look like the tests DB is built from files in s3, and does not contain data that exists in the production DB. If correct, we need to decide what do we want to do with data from data.gov. What do you think?
Hi @atalyaalon , Not sure whether I already wrote to you about it. Our tests are based on a db that is loaded from files. If we remove the cities.csv, then we do not have cities in the test db (nor streets). Should we download cities and streets from data.gov during the test db preparation stage?
Hi @atalyaalon
regarding item #5, תעשיון דלתון
. Our file had an error, two items with same Hebrew name. That itemt does not exist in the current list in data.gov. I think we do not need to do anything about it.
Hi @atalyaalon , Thee are ~16000 accidents in 154 different cities (Redash query) that the city does not appear in the current data.gov cities table. I wonder whether there is a correlation table between a removed city and the current city.
Today we're using cities from a static csv file path:
static/cbs/cities.csv
AND static/cbs/cities.xlsx We want to create a regular update of citiesThe list in data.gov is here
cbs_cities
.main.py
andlocalization.py
) Make sure column names are matched accordingly if there is a mismatch between static cities.csv and data.gov table. Note that we should only use the relevant column of symbol number, yishuv name in hebrew, yishuv name in English, I think that other columns are not relevant to us (static/cbs/cities.csv
probably contains many columns we're not using)