EL-BID / IADB-education-1

Repo for GitHub's Skills-based volunteering project with the IADB
Other
2 stars 1 forks source link

Adapt urbanpy tutorial for schools in ParΓ‘ #6

Closed wavingtowaves closed 1 year ago

wavingtowaves commented 1 year ago

We will update the notebook used in the urbanpy tutorials to make predictions for the locations of potential schools within the state of ParΓ‘.

wavingtowaves commented 1 year ago

Update 2023-02-16

I've worked on a couple of PRs so far that start to spin up the notebook and model for calculating distance and travel times to schools in Para.

https://github.com/github/IADB-education/pull/18

This work is currently blocked 🚧 because I'm unable to get the travel time function to run. While it's expected that the function will take some time to run, it should be under an hour, but I'm timing out at 75 min. So I have to determine which parts of the model or code can be adjusted to get everything to run.

wavingtowaves commented 1 year ago

Update 2023-02-26

Hi all, πŸ‘‹πŸ» had a chance to start up work on the Para model πŸŽ‰ . The work is in a new branch on this repo and PR https://github.com/github/IADB-education/pull/22. Some questions for @bitsandbricks and @Claudio9701

❓ Questions

Here's the map I got for Para by just plotting data from northeast. The lat/long doesn't seem quite right. image

CC: @arnav-gulati, @csmlo, @Juliavieiradeandradedias

Claudio9701 commented 1 year ago

Hi Rob, here are my answers:

Based on image below, we're going to have to splice together population maps for both the northwest and northeast of brazil. Maybe this is not too difficult if we just need to append both dataframes.

Yes, pd.concat([pop_1, pop_2], ignore_index=True) should do the work.

How about the OSRM data, from our in person I remember we were able to determine the single subregion file we need to download. Can you remind me where that is located? It's different than https://download.geofabrik.de/south-america/brazil/sul-latest.osm.pbf

You can find all available data on the GeoFabrik downloads website. Para, is located in the north subregion of Brazil.

@bitsandbricks Should we multiply the full population again by 7% or change the proportion?

The 7% estimate was for the entire country so I would say yes. But if we find an estimate at the state level that could be better.

Here's the map I got for Para by just plotting data from the northeast. The lat/long doesn't seem quite right.

Since Para is in the north of Brazil, it is at the same latitude as Quito, Ecuador (Middle of the World City 🀯). Latitudes look fine to me.

bitsandbricks commented 1 year ago

Agreed on all counts :)

brittanyellich commented 1 year ago

Hey friends! I am working on validating the Florianopolis model in a codespace and trying to translate that to Para. When running the urbanpy.routing.osrm_route command (line 33 here), I'm coming across the below Connection Refused error. Is this something you have seen before and may have thoughts on how to resolve?

Since it's in a codespace, port issues are very possible/likely. However I made sure to forward port 5000 and it didn't seem to help.

ConnectionRefusedError                    Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/urllib3/connection.py:174](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/connection.py:174), in HTTPConnection._new_conn(self)
    173 try:
--> 174     conn = connection.create_connection(
    175         (self._dns_host, self.port), self.timeout, **extra_kw
    176     )
    178 except SocketTimeout:

File [~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:95](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:95), in create_connection(address, timeout, source_address, socket_options)
     94 if err is not None:
---> 95     raise err
     97 raise socket.error("getaddrinfo returns an empty list")

File [~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:85](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:85), in create_connection(address, timeout, source_address, socket_options)
     84     sock.bind(source_address)
---> 85 sock.connect(sa)
     86 return sock

ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/urllib3/connectionpool.py:703](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/connectionpool.py:703), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
...
--> 565     raise ConnectionError(e, request=request)
    567 except ClosedPoolError as e:
    568     raise ConnectionError(e, request=request)

ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /route/v1/profile/-48.523880805378425,-27.474198009739204;-48.5190973,-27.508109?overview=false (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
wavingtowaves commented 1 year ago

πŸ‘‹πŸ» Hi @brittanyellich, maybe @Claudio9701 can remind me if this was the same issue I ran into about ports, and if so, I will add them to the florianopolis notebook to have a record of what to do at this step.

If it's the same error that I got, it's because airplay by default also uses port 5000. The way I resolved this issue was by following the steps outlined here, but for Ventura you have to go to AirDrop settings

Screenshot 2023-03-29 at 7 46 32 AM
brittanyellich commented 1 year ago

Ah, okay, that would make a lot of sense! It does look like I'm unable to map to port 5000 from the codespace. I wonder if there's a way to change which port is used in a config somewhere? Thank you for the tip! πŸ˜„

wavingtowaves commented 1 year ago

I think @Claudio9701 Is working on this for the next update to urbanpy after we ran into this issue when I was first spinning up the model πŸŽ‰

Claudio9701 commented 1 year ago

Hello @brittanyellich & @robcrystalornelas πŸ™‹πŸ½ ! Great to see more people working with UrbanPy ❀️


TLDR: The codespace config had an error, already solved. Rerun it and everything be ok! πŸš€


From the initial @brittanyellich comment, I see she is having this problem inside codespaces. To be able to run the urbanpy.routing.osrm_route function we first need to have a running osrm server. To do this we have to run the urbanpy.routing.start_osrm_server function. This function knows which data to download from Geofabrik using the country and continent arguments. Since Brazil is a really big country the Geofabrik data is divided in sub-regions, which weren't taken in account on the first version of UrbanPy.

A temporary solution is to download the subregion data directly from the Geofabrik repository. This is explained here with more detail.

After testing inside codespaces I found the a bug in the devcontainer.json configuration file. In the "postCreateCommand" we were trying to run the "script.sh" when it should be "setup.sh".

Screenshot 2023-03-29 at 22 14 38

I have fixed it in this commit. I have also dropped some libraries because there are downloaded as urbanpy dependencies πŸ˜‰. The "setup.sh" bash script creates the osrm data folder and downloads the Brazil Sul sub-region data into it. Since the data was not being downloaded, the start_osrm_server function was not running successfully because it didn't find it.

Now everything is working as expected:

Screenshot 2023-03-29 at 22 10 51

Sorry for the long output, this is already fix in the urbanpy master branch but I have to make some time before releasing the pypi update ...

Screenshot 2023-03-29 at 22 11 33

When the start_osrm_server function runs successfully the last lines of output should look like this:

...
Server was started succesfully
[info] http 1.1 compression handled by zlib version 1.2.8
[info] Listening on: 0.0.0.0:5000
[info] running and waiting for requests

Then, the osrm_route function runs as expected:

Screenshot 2023-03-29 at 22 12 23

I applied the same temporary solution to the para notebook in this commit and everything worked as expected as well:

Screenshot 2023-03-29 at 23 22 06 Screenshot 2023-03-29 at 23 22 23

After a long output ...

Screenshot 2023-03-29 at 23 22 32 Screenshot 2023-03-29 at 23 22 43

Finally, the solution @robcrystalornelas mentioned of turning off AirPlay because is taking the port 5000 (the default port on which UrbanPy starts the OSRM server) works when we are running our code locally.

I hope this comment is helpful, please let me know if you have any other questions πŸ€—

brittanyellich commented 1 year ago

Thank you @Claudio9701 ! It is working great in a codespace now πŸŽ‰

wavingtowaves commented 1 year ago

πŸ‘‹πŸ» When running the Para model in a codespace, I'm getting an error that is OSError: [Errno 28] No space left on device

This is specific to when I run

## multiply every row in pop_para by 0.07
pop_para['population_2020'] = pop_para['population_2020'].parallel_apply(lambda x: x*0.07)
pop_para.head()

When I switch to regular apply instead of parallel_apply, this function runs. Why is that? It's not a super memory intensive type task so just wondering what might be happening?

wavingtowaves commented 1 year ago

Progress update

Major πŸ’– to @brittanyellich and @arnav-gulati for recent work on this to help create a travel time and distance model for Para!

Updates:

Higher resolution image of Para w/ hex size set to 7

image

Map of Para youth population

We multiplied our full adult population maps by 0.07 to produce these state-scale population data that roughly approximate the percentage of school age children in Brazil so that our population numbers are not overinflated

image

Map of schools in Para

Limited to just buildings categorized as either school or kindergarten

Travel time to schools in Para

In this map, cooler colors indicate longer travel times to schools newplot

Claudio9701 commented 1 year ago

πŸ‘‹πŸ» When running the Para model in a codespace, I'm getting an error that is OSError: [Errno 28] No space left on device

This is specific to when I run

## multiply every row in pop_para by 0.07
pop_para['population_2020'] = pop_para['population_2020'].parallel_apply(lambda x: x*0.07)
pop_para.head()

When I switch to regular apply instead of parallel_apply, this function runs. Why is that? It's not a super memory intensive type task so just wondering what might be happening?

I'm not sure why the device space is being used that much, could be an error with parallel_apply. Despite this, I think it's not necessary to use parallel_apply or apply for this operations, they are already optimized in pandas.

I recommend to use it only for custom functions that are not optimized/vectorized in pandas by default.

brittanyellich commented 1 year ago

Thank you! I forgot to update this issue but I did get around it by running parallel_apply prior to combining the two population arrays for Para. I also found that increasing the number of pandarallel workers helped as well.

Claudio9701 commented 1 year ago

Also for Para , since it's q big area, could be useful to filter out the hexagons with population missing or equal to zero

wavingtowaves commented 1 year ago

@Claudio9701 We do filter out hexagons with zero population using the following code

hex_para_plot = hex_para.query("population_2020 > 0").reset_index(drop=True)
# Reset index is needed to avoid an error with plotly choropleth_map

But this happens right before we make the final choropleth map to show a heatmap of travel times to schools. Are you suggesting that this filtering happen prior to this, maybe before we do route calculations?

Claudio9701 commented 1 year ago

Yes, probably it will because this reduces of the number of queries done to the OSRM server

wavingtowaves commented 1 year ago

Closing this issue as we've figured out how to adapt tutorial for Para πŸŽ‰