dlab-berkeley / Python-Geospatial-Fundamentals-Legacy

D-Lab's 6 hour introduction to working with geospatial data in Python. Learn how to import, visualize, and analyze geospatial data using GeoPandas in Python.
Other
57 stars 30 forks source link

Cleaning geospatial data #4

Open brooksjessup opened 3 years ago

brooksjessup commented 3 years ago

It could be helpful to include more instruction on how to clean geospatial data. The data used in the notebooks has been carefully curated and is very clean. However, raw data is often full of errors, missing values, etc.

Currently, there are two points in the notebooks (that I can think of) where the learner encounters (and learns how to handle) common issues with geospatial data: non-matching CRS codes in Notebook 3, and typos in the bike blvds dataset in Notebook 5.

I wonder if it might be useful to also show an example of missing or incorrect geospatial data? For example, a BART station that is clearly out of place (wrong coordinates), etc. How does one identify such errors and then correct them?

This is probably not essential, but after running into some messy geo data myself, I thought it might be something to add to the workshop in the future.

EastBayEv commented 3 years ago

Demonstrate/discuss geospatial data cleaning (as a learning objective)?

hikari-murayama commented 3 years ago

That's a great point! The current version of our workshop scheduled as a 2 day affair doesn't give us much leeway to add more, but I think it might be a nice idea to work on it as an optional notebook for now.