Closed wavingtowaves closed 1 year ago
I've worked on a couple of PRs so far that start to spin up the notebook and model for calculating distance and travel times to schools in Para.
https://github.com/github/IADB-education/pull/18
This work is currently blocked π§ because I'm unable to get the travel time function to run. While it's expected that the function will take some time to run, it should be under an hour, but I'm timing out at 75 min. So I have to determine which parts of the model or code can be adjusted to get everything to run.
Hi all, ππ» had a chance to start up work on the Para model π . The work is in a new branch on this repo and PR https://github.com/github/IADB-education/pull/22. Some questions for @bitsandbricks and @Claudio9701
https://download.geofabrik.de/south-america/brazil/sul-latest.osm.pbf
Here's the map I got for Para by just plotting data from northeast
. The lat/long doesn't seem quite right.
CC: @arnav-gulati, @csmlo, @Juliavieiradeandradedias
Hi Rob, here are my answers:
Based on image below, we're going to have to splice together population maps for both the northwest and northeast of brazil. Maybe this is not too difficult if we just need to append both dataframes.
Yes, pd.concat([pop_1, pop_2], ignore_index=True)
should do the work.
How about the OSRM data, from our in person I remember we were able to determine the single subregion file we need to download. Can you remind me where that is located? It's different than https://download.geofabrik.de/south-america/brazil/sul-latest.osm.pbf
You can find all available data on the GeoFabrik downloads website. Para, is located in the north subregion of Brazil.
@bitsandbricks Should we multiply the full population again by 7% or change the proportion?
The 7% estimate was for the entire country so I would say yes. But if we find an estimate at the state level that could be better.
Here's the map I got for Para by just plotting data from the northeast. The lat/long doesn't seem quite right.
Since Para is in the north of Brazil, it is at the same latitude as Quito, Ecuador (Middle of the World City π€―). Latitudes look fine to me.
Agreed on all counts :)
Hey friends! I am working on validating the Florianopolis model in a codespace and trying to translate that to Para. When running the urbanpy.routing.osrm_route
command (line 33 here), I'm coming across the below Connection Refused error. Is this something you have seen before and may have thoughts on how to resolve?
Since it's in a codespace, port issues are very possible/likely. However I made sure to forward port 5000 and it didn't seem to help.
ConnectionRefusedError Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/urllib3/connection.py:174](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/connection.py:174), in HTTPConnection._new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw
176 )
178 except SocketTimeout:
File [~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:95](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:95), in create_connection(address, timeout, source_address, socket_options)
94 if err is not None:
---> 95 raise err
97 raise socket.error("getaddrinfo returns an empty list")
File [~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:85](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/util/connection.py:85), in create_connection(address, timeout, source_address, socket_options)
84 sock.bind(source_address)
---> 85 sock.connect(sa)
86 return sock
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/urllib3/connectionpool.py:703](https://vscode-remote+codespaces-002bbrittanyellich-002dsuper-002dduper-002dchainsaw-002d4qv59p97qpvh464.vscode-resource.vscode-cdn.net/workspaces/IADB-education/~/.local/lib/python3.10/site-packages/urllib3/connectionpool.py:703), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
...
--> 565 raise ConnectionError(e, request=request)
567 except ClosedPoolError as e:
568 raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /route/v1/profile/-48.523880805378425,-27.474198009739204;-48.5190973,-27.508109?overview=false (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
ππ» Hi @brittanyellich, maybe @Claudio9701 can remind me if this was the same issue I ran into about ports, and if so, I will add them to the florianopolis notebook to have a record of what to do at this step.
If it's the same error that I got, it's because airplay by default also uses port 5000. The way I resolved this issue was by following the steps outlined here, but for Ventura you have to go to AirDrop settings
Ah, okay, that would make a lot of sense! It does look like I'm unable to map to port 5000 from the codespace. I wonder if there's a way to change which port is used in a config somewhere? Thank you for the tip! π
I think @Claudio9701 Is working on this for the next update to urbanpy after we ran into this issue when I was first spinning up the model π
Hello @brittanyellich & @robcrystalornelas ππ½ ! Great to see more people working with UrbanPy β€οΈ
TLDR: The codespace config had an error, already solved. Rerun it and everything be ok! π
From the initial @brittanyellich comment, I see she is having this problem inside codespaces. To be able to run the urbanpy.routing.osrm_route
function we first need to have a running osrm server. To do this we have to run the urbanpy.routing.start_osrm_server
function. This function knows which data to download from Geofabrik using the country
and continent
arguments. Since Brazil is a really big country the Geofabrik data is divided in sub-regions, which weren't taken in account on the first version of UrbanPy.
A temporary solution is to download the subregion data directly from the Geofabrik repository. This is explained here with more detail.
After testing inside codespaces I found the a bug in the devcontainer.json
configuration file. In the "postCreateCommand" we were trying to run the "script.sh" when it should be "setup.sh".
I have fixed it in this commit. I have also dropped some libraries because there are downloaded as urbanpy dependencies π. The "setup.sh" bash script creates the osrm data folder and downloads the Brazil Sul sub-region data into it. Since the data was not being downloaded, the start_osrm_server
function was not running successfully because it didn't find it.
Now everything is working as expected:
Sorry for the long output, this is already fix in the urbanpy master branch but I have to make some time before releasing the pypi update ...
When the start_osrm_server function runs successfully the last lines of output should look like this:
...
Server was started succesfully
[info] http 1.1 compression handled by zlib version 1.2.8
[info] Listening on: 0.0.0.0:5000
[info] running and waiting for requests
Then, the osrm_route function runs as expected:
I applied the same temporary solution to the para notebook in this commit and everything worked as expected as well:
After a long output ...
Finally, the solution @robcrystalornelas mentioned of turning off AirPlay because is taking the port 5000 (the default port on which UrbanPy starts the OSRM server) works when we are running our code locally.
I hope this comment is helpful, please let me know if you have any other questions π€
Thank you @Claudio9701 ! It is working great in a codespace now π
ππ» When running the Para model in a codespace, I'm getting an error that is OSError: [Errno 28] No space left on device
This is specific to when I run
## multiply every row in pop_para by 0.07
pop_para['population_2020'] = pop_para['population_2020'].parallel_apply(lambda x: x*0.07)
pop_para.head()
When I switch to regular apply
instead of parallel_apply
, this function runs. Why is that? It's not a super memory intensive type task so just wondering what might be happening?
Major π to @brittanyellich and @arnav-gulati for recent work on this to help create a travel time and distance model for Para!
Updates:
We multiplied our full adult population maps by 0.07 to produce these state-scale population data that roughly approximate the percentage of school age children in Brazil so that our population numbers are not overinflated
Limited to just buildings categorized as either school
or kindergarten
In this map, cooler colors indicate longer travel times to schools
ππ» When running the Para model in a codespace, I'm getting an error that is
OSError: [Errno 28] No space left on device
This is specific to when I run
## multiply every row in pop_para by 0.07 pop_para['population_2020'] = pop_para['population_2020'].parallel_apply(lambda x: x*0.07) pop_para.head()
When I switch to regular
apply
instead ofparallel_apply
, this function runs. Why is that? It's not a super memory intensive type task so just wondering what might be happening?
I'm not sure why the device space is being used that much, could be an error with parallel_apply. Despite this, I think it's not necessary to use parallel_apply or apply for this operations, they are already optimized in pandas.
I recommend to use it only for custom functions that are not optimized/vectorized in pandas by default.
Thank you! I forgot to update this issue but I did get around it by running parallel_apply
prior to combining the two population arrays for Para. I also found that increasing the number of pandarallel workers helped as well.
Also for Para , since it's q big area, could be useful to filter out the hexagons with population missing or equal to zero
@Claudio9701 We do filter out hexagons with zero population using the following code
hex_para_plot = hex_para.query("population_2020 > 0").reset_index(drop=True)
# Reset index is needed to avoid an error with plotly choropleth_map
But this happens right before we make the final choropleth map to show a heatmap of travel times to schools. Are you suggesting that this filtering happen prior to this, maybe before we do route calculations?
Yes, probably it will because this reduces of the number of queries done to the OSRM server
Closing this issue as we've figured out how to adapt tutorial for Para π
We will update the notebook used in the urbanpy tutorials to make predictions for the locations of potential schools within the state of ParΓ‘.