EL-BID / IADB-education-1

Repo for GitHub's Skills-based volunteering project with the IADB
Other
2 stars 1 forks source link

Adapt urbanpy tutorial for schools in Florianópolis #5

Closed wavingtowaves closed 1 year ago

wavingtowaves commented 1 year ago

We will adapt the existing urbanpy tutorial for the city of Florianópolis as a first step to creating predictions on the much larger state of Pará

wavingtowaves commented 1 year ago

Update 2022-01-19

Hi all 👋 here's an update on the progress so far.

I've got a recently committed jupyter notebook focused on a spatial model for florianopolis. I've worked through over half of the tutorial, and in the process open up a couple of issues to help solve some data problems I ran into.

🚧 Blocker @bitsandbricks I hit another blocker when working through tutorial information and could use your help.

When I try to run the line

es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')

I get an error

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Not sure why this is, since the code is identical to the tutorial and when I run

ba.total_bounds

for florianopolis, I get an output of

array([-48.613    , -27.847    , -48.3585929, -27.379    ])

Which seems similar enough to what is output in the tutorial just without as many decimal places. Any thoughts on why this might be happening?

bitsandbricks commented 1 year ago

I'll take a look!

Rob, can you share the notebook/script you are running to use as a reproducible example?

wavingtowaves commented 1 year ago

Thanks so much @bitsandbricks!

The notebook is linked above and here. But I've set it up with pipenv files as well so if you clone locally and try out running the notebook on your machine.

Not sure if you've worked with GitHub codespaces much, but you could try running a codespace like I show below and see if everything in the notebook will run through the codespace. Let me know if this work well for you to test. 👍🏻

Screenshot 2023-01-19 at 3 56 11 PM

bitsandbricks commented 1 year ago

Ohh it was right in front of my eyes! Sorry Rob. I'll keep you posted

bitsandbricks commented 1 year ago

Alright, good news: it worked for me, running your notebook on a codespace.

image

Bad news is, I didn't do anything besides clicking on the cells so I'm not sure what caused your problem!

Lazy guess: maybe the OSM backend was having a bad day when you tried to downlaod data via urbanPy?

Claudio9701 commented 1 year ago

Same thing on my edge, hope this is not a common error.

urbanpy-florianopolis-education

Also as a side note, the new version of urbanpy (in the master branch for the moment) have a new more flexible function to download data from overpass. I did a fast example of how to query education facilities inside Florianopolis in this colab notebook.

wavingtowaves commented 1 year ago

✅ Updates

Thanks to help from @bitsandbricks and @Claudio9701 I was able to make good progress on the model for florianopolis.

I created a count of the educational facilities in florianopolis (but see my question below)

Image

Also I created a map of the educational facilities in florianopolis. Next up is calculating the walking distances to educational facilities.

❓ Questions

1) We have many different types of points of interest for educational facilities. Which should we include?

For now I've selected: school, kindergarten, language school, and library. What do you think?

2) Also in terms of age groups. We can download:

What would be appropriate?

CC: @bitsandbricks @Juliavieiradeandradedias

Claudio9701 commented 1 year ago

I think haversine distance (which considers earth curvature) is a good measure for considerable distances. I would say in cities with a big area or working at the country level. I usually use this one for Lima and other countries with large distances and irregular/incomplete road networks.

Other options for distance calculation are euclidean (the most naive) and cityblock (See Image below). These two work really well for small distances and cities with a regular road network.

image

wavingtowaves commented 1 year ago

I appreciate that explanation 🙌

Even though florianopolis is a smaller area, I still think haversine will work 👍

wavingtowaves commented 1 year ago

Thanks to your help in #7, @Claudio9701. I was able to get the docker container spun up 🎉

❓ Follow-up questions:

1) For some reason, when running

es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')

I get the error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Full error message below. What do you think might be happening here? Somethings happening with the JSON decoder, but I'm not sure how to resolve.

2) Do I need to update the below code

distance, duration = up.routing.osrm_route(origin=point1, destination=point2) that you sent over or does it already know that origin is centroid (point1) and destination (point2) is school?

3) With the docker container up and running, I'm getting this error:

Error: No such object: osrm_routing_server_south-america_brazil_sul_foot

Maybe this is due to some upstream issue with the JSON portion mentioned in my point 1 above.


JSONDecodeError                           Traceback (most recent call last)
File /opt/homebrew/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    970 try:
--> 971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
...
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

CC: @Juliavieiradeandradedias, @csmlo

Claudio9701 commented 1 year ago
  1. Looks like the request to the overpass api is not receiving a correct response. Could you try to reach the overpass api with requests or curl to see if is something with the network?

  2. Yes it needs to be updated the steps are:

This notebooks can help:

https://github.com/EL-BID/urbanpy/blob/master/notebooks/Creating%20an%20interactive%20webapp.ipynb

https://github.com/Claudio9701/urbanpy-brazil-demo/blob/master/Pop_Access_UrbanPy_Demo_BR.ipynb

bitsandbricks commented 1 year ago

Bravo Rob!

Back to your initial questions:

- We have many different types of points of interest for educational facilities. Which should we include?

Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"

This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results

In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!

wavingtowaves commented 1 year ago

@Claudio9701 thanks so much for the pairing session today 🌟

A few quick updates:

1) es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education') gave me an error a few more times then it ran fine. Good to know that this will work sometimes and be buggy sometimes. At least I know to expect this 👍

2) When I run the code below to start the local server, is the process supposed to end? We were getting an error right away this morning. I'll paste fuller error message below:

Image


Error: No such object: osrm_routing_server_south-america_brazil_sul_foot latest: Pulling from osrm/osrm-backend Digest: sha256:af5d4a83fb90086a43b1ae2ca22872e6768766ad5fcbb07a29ff90ec644ee409 Status: Image is up to date for osrm/osrm-backend:latest docker.io/osrm/osrm-backend:latest /bin/sh: line 4: wget: command not found docker: Error response from daemon: Conflict. The container name "/osrm_extract" is already in use by container "6aae95e70200660f1c8d7a5d3f609caeba7cbe42ef3a7257361f9bd152e1c2da". You have to remove (or rename) that container to be able to reuse that name.

docker: Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:5000 -> 0.0.0.0:0: listen tcp 0.0.0.0:5000: bind: address already in use. time="2023-02-09T13:02:43-08:00" level=error msg="error waiting for container: context canceled"

Error: No such container: osrm_extract docker: Error response from daemon: Conflict. The container name "/osrm_routing_server_south-america_brazil_sul_foot" is already in use by container "2ce75c74142e442dd48ae9fca40443482527d419f2424e49a3e694cdd1576e07". You have to remove (or rename) that container to be able to reuse that name. See 'docker run --help'.

wavingtowaves commented 1 year ago

Quick update!

I figured out what was going on with the port error we were running into w/ docker @Claudio9701.

We did need to run lsof -i:5000 to see what programs were already using this port. I did some investigating and found out that it's related to airplay on macs 😓 Glad we got this cleared up.

Screenshot 2023-02-14 at 2 34 53 PM

I'm able to connect to the osm server using the original code in the tutorial 🎉 I do need to have docker open and running on my machine for it to work :

up.routing.start_osrm_server('sul', 'south-america_brazil', 'foot')

❓ for @Claudio9701, what's the typical run time for the.

Currently, I have h3_resolution set to 8, this is for the city of florianopolis. Is this too high? Right now the function below has run for 30 min. I could also trying spinning up a GitHub codespace and seeing if it runs faster on one of our servers. Let me know what you think.

distance_duration = hex_flor.apply(
    lambda row: up.routing.osrm_route(
        origin=row.geometry.centroid, 
        destination = schools.iloc[row['closest_school']]['geometry']
    ),
    result_type='expand',
    axis=1,
)
csmlo commented 1 year ago

CC: @bitsandbricks on above issue for visibility.

Claudio9701 commented 1 year ago

Hello Rob, great news you could spin up the docker container 👏🏼🙌🏽🚀!

Next version of urbanpy need to give the user the ability to choose on which port to run the osrm server.

Regarding the other question, resolution 8 should be good for a small city like Florianópolis. I usually run this function with tqdm so I can have an idea of how much time it will take.

from tqdm.notebook import tqdm
tqdm.pandas()

df.progress_apply(...)

If this is taking to much time, I've also used pandarallel to speed up the calculation.

from pandarallel import pandarallel

pandarallel.initialize(progress_bar=True)

df.parallel_apply(...)

This also have a progress bar that give you a hint of how much time the processing will take. Both are install using pip.

If it is still taking too much time you could filter out hexagons without population or with population bellow a certain threshold. But in my experience this it's almost never necessary.

Hope you find this useful!

bitsandbricks commented 1 year ago

Bravo Rob!

Back to your initial questions:

- We have many different types of points of interest for educational facilities. Which should we include?

Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"

This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results

  • Also in terms of age groups [...] What would be appropriate?

In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!

Bump

wavingtowaves commented 1 year ago

👋🏻 Just so I'm extra clear on which step/how to do this estimation for school age children based on the population pyramid does this look right to you @bitsandbricks and @Claudio9701

pop_flor = up.geom.filter_population(full_pop_brazil_southeast, flor)
pop_flor['population'] = pop_flor['population'].parallel_apply(lambda x: x*0.07)
pop_flor.head()
bitsandbricks commented 1 year ago

It does to me!

wavingtowaves commented 1 year ago

I am going to close this issue for our Florianópolis model since we have a notebook that does this analysis in our repo.

I'll create a new issue for re-running this model with INEP's databases