OSMPH / Tabang-AI

Coordinating AI-assisted mapping in the Philippines
Creative Commons Attribution 4.0 International
4 stars 0 forks source link

HRSL vector v1 #6

Closed maning closed 5 years ago

maning commented 5 years ago

Resolves https://github.com/OSMPH/Tabang-AI/issues/2

This PR add the HRSL derived vector for prioritizing mapping project with AI. The data has the following files:

Description of each file is in the README.md

ph_hrsl_v1

@seav @govvin Can one of you download and inspect the data locally before I merge? 🙇‍♀

seav commented 5 years ago

Some comments:

seav commented 5 years ago

Nice to have: If the MultiPolygon inner rings have a small area (FSVO small), it might be better to just remove them to simplify the shapes further.

govvin commented 5 years ago

Thank you, @maning .

Please consider the following:

image

image the polygon features look ideal for settlements, but we could be missing out on roads that connect these settlements when use this to create tasks (lines in red simulate roads not within feature outlines)

govvin commented 5 years ago

Before we publish a version of this, how about we run a test over the same area covered by Ompong last year and compare all features added within the period (Sep-Oct) the task was running and compare that coverage against this one?

maning commented 5 years ago

I updated the data, see OP

Changes includes the following based on your comment:

The coordinates of the GeoJSON have 9 decimal places. This is too much precision. 5 should be enough.

👍 geojson is now 5 precision.

The GeoJSON file contains 1600+ MultiPolygon features with complex shapes. This is very slow to load in tools/applications such as QGIS. Maybe it would be better to split the GeoJSON into smaller pieces, maybe grouped by ADM2 (province) level.

👍 I converted to single-parts but it still one big geojson. Its much faster to load now than before. Also to reduce file size, I removed the Pcode attribute and added a separate csv file. User can join this attributes if they need it for analysis.

can we improve optimize the file size further by simplifying vertices

👍 I simplified the hrsl_ph_buffer100m_v1 by removing vertices within 10m.

the polygon features look ideal for settlements, but we could be missing out on roads that connect these settlements when use this to create tasks (lines in red simulate roads not within feature outlines)

👍 To resolve this, I created a new geojson with 500m grid (hrsl_ph_500m_grid_v1) intersected from the hrsl_ph_buffer100m_v1 this should cover most cases. I also think this is ideal vector we can use fro preparing the tasks.

Before we publish a version of this, how about we run a test over the same area covered by Ompong last year and compare all features added within the period (Sep-Oct) the task was running and compare that coverage against this one?

👍 Yes, I did a quick evaluation see next comment. 👎 Not in Ompong area though. :)

maning commented 5 years ago

To quickly evaluate if the derived vectors will cover most settlement areas in a certain area. I compared the vector data and OSM coverage in Banton Island. I chose Banton Island because I used the island to test if HRSL is an appropriate data to use for prioritizing remote mapping (see my OSM diary).

Details of the comparison.

The OSM features were counted if it is partially within each polygon.

hrsl_ph_buffer100m_v1 hrsl_ph_500m_grid_v1
Map hrsl grid
Building - within polygon - 1486
- outside polygon - 376
- within polygon - 1746
- outside polygon - 116
Roads - within polygon - 85
- outside polygon - 10
- within polygon - 91
- outside polygon - 4
"Detection rate" (in %) - Building - 79.8
- Road - 89.4
- Building - 93.3
- Road - 95.8

Summary

maning commented 5 years ago

@govvin @seav Please give this a review again. Thanks!

govvin commented 5 years ago

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

It is vry challenging to achieve 100% coverage within the context of emergency mapping, and there's always the risk of not mapping isolated settlements - and even with the current approach there's no assurance of that. Helping contributors focus on smaller areas are extremely helpful. Emergency response resources are limited, and DRR managers will often focus initially on where impact is greatest.

Once released, it might be a good idea to conduct a public workshop (e.g. potential TM project managers, DRR people, etc) and record it on video.

maning commented 5 years ago

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

Good point! HSRL is CC-By 4.0 while ph admin boundaries are "Humanitarian use only" What's the appropriate license for derived data coming from these two?

@seav

maning commented 5 years ago

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

Let's resolve this issue with a project wide license, tracking here: https://github.com/OSMPH/Tabang-AI/issues/8