Closed wavingtowaves closed 1 year ago
Hi all 👋 here's an update on the progress so far.
I've got a recently committed jupyter notebook focused on a spatial model for florianopolis. I've worked through over half of the tutorial, and in the process open up a couple of issues to help solve some data problems I ran into.
🚧 Blocker @bitsandbricks I hit another blocker when working through tutorial information and could use your help.
When I try to run the line
es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')
I get an error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Not sure why this is, since the code is identical to the tutorial and when I run
ba.total_bounds
for florianopolis, I get an output of
array([-48.613 , -27.847 , -48.3585929, -27.379 ])
Which seems similar enough to what is output in the tutorial just without as many decimal places. Any thoughts on why this might be happening?
I'll take a look!
Rob, can you share the notebook/script you are running to use as a reproducible example?
Thanks so much @bitsandbricks!
The notebook is linked above and here. But I've set it up with pipenv
files as well so if you clone locally and try out running the notebook on your machine.
Not sure if you've worked with GitHub codespaces much, but you could try running a codespace like I show below and see if everything in the notebook will run through the codespace. Let me know if this work well for you to test. 👍🏻
Ohh it was right in front of my eyes! Sorry Rob. I'll keep you posted
Alright, good news: it worked for me, running your notebook on a codespace.
Bad news is, I didn't do anything besides clicking on the cells so I'm not sure what caused your problem!
Lazy guess: maybe the OSM backend was having a bad day when you tried to downlaod data via urbanPy?
Same thing on my edge, hope this is not a common error.
Also as a side note, the new version of urbanpy (in the master branch for the moment) have a new more flexible function to download data from overpass. I did a fast example of how to query education facilities inside Florianopolis in this colab notebook.
Thanks to help from @bitsandbricks and @Claudio9701 I was able to make good progress on the model for florianopolis.
I created a count of the educational facilities in florianopolis (but see my question below)
Also I created a map of the educational facilities in florianopolis. Next up is calculating the walking distances to educational facilities.
1) We have many different types of points of interest for educational facilities. Which should we include?
For now I've selected: school, kindergarten, language school, and library. What do you think?
2) Also in terms of age groups. We can download:
children
: children (age 0-5)youth
: youth (15-24)What would be appropriate?
metric='haversine'
make the most sense for our goal here?CC: @bitsandbricks @Juliavieiradeandradedias
I think haversine
distance (which considers earth curvature) is a good measure for considerable distances. I would say in cities with a big area or working at the country level. I usually use this one for Lima and other countries with large distances and irregular/incomplete road networks.
Other options for distance calculation are euclidean
(the most naive) and cityblock
(See Image below). These two work really well for small distances and cities with a regular road network.
I appreciate that explanation 🙌
Even though florianopolis is a smaller area, I still think haversine will work 👍
Thanks to your help in #7, @Claudio9701. I was able to get the docker container spun up 🎉
❓ Follow-up questions:
1) For some reason, when running
es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')
I get the error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Full error message below. What do you think might be happening here? Somethings happening with the JSON decoder, but I'm not sure how to resolve.
2) Do I need to update the below code
distance, duration = up.routing.osrm_route(origin=point1, destination=point2)
that you sent over or does it already know that origin is centroid (point1) and destination (point2) is school?
3) With the docker container up and running, I'm getting this error:
Error: No such object: osrm_routing_server_south-america_brazil_sul_foot
Maybe this is due to some upstream issue with the JSON portion mentioned in my point 1 above.
JSONDecodeError Traceback (most recent call last)
File /opt/homebrew/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
970 try:
--> 971 return complexjson.loads(self.text, **kwargs)
972 except JSONDecodeError as e:
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
...
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
CC: @Juliavieiradeandradedias, @csmlo
Looks like the request to the overpass api is not receiving a correct response. Could you try to reach the overpass api with requests or curl to see if is something with the network?
Yes it needs to be updated the steps are:
This notebooks can help:
https://github.com/EL-BID/urbanpy/blob/master/notebooks/Creating%20an%20interactive%20webapp.ipynb
https://github.com/Claudio9701/urbanpy-brazil-demo/blob/master/Pop_Access_UrbanPy_Demo_BR.ipynb
Bravo Rob!
Back to your initial questions:
- We have many different types of points of interest for educational facilities. Which should we include?
Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"
This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results
In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!
@Claudio9701 thanks so much for the pairing session today 🌟
A few quick updates:
1) es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')
gave me an error a few more times then it ran fine. Good to know that this will work sometimes and be buggy sometimes. At least I know to expect this 👍
2) When I run the code below to start the local server, is the process supposed to end? We were getting an error right away this morning. I'll paste fuller error message below:
Error: No such object: osrm_routing_server_south-america_brazil_sul_foot latest: Pulling from osrm/osrm-backend Digest: sha256:af5d4a83fb90086a43b1ae2ca22872e6768766ad5fcbb07a29ff90ec644ee409 Status: Image is up to date for osrm/osrm-backend:latest docker.io/osrm/osrm-backend:latest /bin/sh: line 4: wget: command not found docker: Error response from daemon: Conflict. The container name "/osrm_extract" is already in use by container "6aae95e70200660f1c8d7a5d3f609caeba7cbe42ef3a7257361f9bd152e1c2da". You have to remove (or rename) that container to be able to reuse that name.
docker: Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:5000 -> 0.0.0.0:0: listen tcp 0.0.0.0:5000: bind: address already in use. time="2023-02-09T13:02:43-08:00" level=error msg="error waiting for container: context canceled"
Error: No such container: osrm_extract docker: Error response from daemon: Conflict. The container name "/osrm_routing_server_south-america_brazil_sul_foot" is already in use by container "2ce75c74142e442dd48ae9fca40443482527d419f2424e49a3e694cdd1576e07". You have to remove (or rename) that container to be able to reuse that name. See 'docker run --help'.
I figured out what was going on with the port error we were running into w/ docker @Claudio9701.
We did need to run lsof -i:5000
to see what programs were already using this port. I did some investigating and found out that it's related to airplay on macs 😓 Glad we got this cleared up.
I'm able to connect to the osm server using the original code in the tutorial 🎉 I do need to have docker open and running on my machine for it to work :
up.routing.start_osrm_server('sul', 'south-america_brazil', 'foot')
❓ for @Claudio9701, what's the typical run time for the.
Currently, I have h3_resolution set to 8, this is for the city of florianopolis. Is this too high? Right now the function below has run for 30 min. I could also trying spinning up a GitHub codespace and seeing if it runs faster on one of our servers. Let me know what you think.
distance_duration = hex_flor.apply(
lambda row: up.routing.osrm_route(
origin=row.geometry.centroid,
destination = schools.iloc[row['closest_school']]['geometry']
),
result_type='expand',
axis=1,
)
CC: @bitsandbricks on above issue for visibility.
Hello Rob, great news you could spin up the docker container 👏🏼🙌🏽🚀!
Next version of urbanpy need to give the user the ability to choose on which port to run the osrm server.
Regarding the other question, resolution 8 should be good for a small city like Florianópolis. I usually run this function with tqdm
so I can have an idea of how much time it will take.
from tqdm.notebook import tqdm
tqdm.pandas()
df.progress_apply(...)
If this is taking to much time, I've also used pandarallel to speed up the calculation.
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=True)
df.parallel_apply(...)
This also have a progress bar that give you a hint of how much time the processing will take. Both are install using pip.
If it is still taking too much time you could filter out hexagons without population or with population bellow a certain threshold. But in my experience this it's almost never necessary.
Hope you find this useful!
Bravo Rob!
Back to your initial questions:
- We have many different types of points of interest for educational facilities. Which should we include?
Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"
This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results
- Also in terms of age groups [...] What would be appropriate?
In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!
Bump
👋🏻 Just so I'm extra clear on which step/how to do this estimation for school age children based on the population pyramid does this look right to you @bitsandbricks and @Claudio9701
pop_flor = up.geom.filter_population(full_pop_brazil_southeast, flor)
pop_flor['population'] = pop_flor['population'].parallel_apply(lambda x: x*0.07)
pop_flor.head()
It does to me!
I am going to close this issue for our Florianópolis model since we have a notebook that does this analysis in our repo.
I'll create a new issue for re-running this model with INEP's databases
We will adapt the existing urbanpy tutorial for the city of Florianópolis as a first step to creating predictions on the much larger state of Pará