developmentseed / tecnico-energy-app

https://dashboard-ds-peach.vercel.app/
0 stars 0 forks source link

Open Questions for Fixed Municipal Data (v5) #35

Closed heidimok closed 2 months ago

heidimok commented 3 months ago

Note: This is a Slack discussion (from @alukach) I'm bringing into GitHub so we can track ongoing municipal data questions and updates as comments here for better communication.

Context

We received the latest municipal data (v5) from Ricardo. Municipal Data v5 - Google Sheets

Some outstanding issues with the data:

The following rows in the metrics sheet are missing corresponding geometries in the municipal spatial data: 3101, 3103, 3102, 4801, 4401, 4601, 4502, 4301, 4901, 3110, 3104, 4501, 3108, 4603, 3106, 3107, 3201, 4701, 4302, 4802, 3109, 4602, 3105

Additionally, the Study sheet has a few issues. Some field names needed to be update (some of these changes to requirements are new, not Tecnico's fault in any way) and the fields that are being used to join Geometries to Metrics are incorrect. Anthony was able to get the ingestion working by making the following changes to the study sheet:

Name*   Municipal Data
Description*    An example study
- Image 
+ Image Src 
Details 
Scale*  Municipality
- key_field_name    codigo
+ Metrics Key Field nombre
- name_field_name   nombre
+ Geom Key Field    NAME_2
- highlight true/false
+ Highlight TRUE

A corrected sheet can be found here: https://docs.google.com/spreadsheets/d/1Zwoohp8Zq8D4LgLY1aIrB5LTLRvxmV9dr8FAghphCV0/edit#gid=1485987722

Data Issues

Here is the output from the seed operation (ie ingesting the geometries & metrics). Joining on the NAME_2 field of the geometry is somewhat problematic in that there duplicate geometries that share that field in the geospatial data. If a duplicate is found in the geometry, we log + ignore and continue. If a geometry is found without a corresponding metric, we log + ignore and continue.

build: Running seed command `ts-node --compiler-options {"module":"CommonJS"} prisma/seed.ts` ...
build: 2024-04-12T17:14:13.190Z | municipal-study.geojson.gz | Ignoring.
build: 2024-04-12T17:14:15.741Z | municipal-study.xlsx | deleted existing study record: municipal-study
build: 2024-04-12T17:14:15.741Z | municipal-study.xlsx | ingesting study metadata: municipal-study...
build: 2024-04-12T17:14:16.855Z | municipal-study.xlsx | created study record: municipal-study
build: 2024-04-12T17:14:16.855Z | municipal-study.xlsx | ingesting metrics for municipal-study...
build: 2024-04-12T17:14:17.932Z | municipal-study.xlsx | ingested 306 metrics records
build: 2024-04-12T17:14:17.932Z | municipal-study.xlsx | ingesting scenarios for municipal-study...
build: 2024-04-12T17:14:18.128Z | municipal-study.xlsx | ingested 13 scenario metadata records
build: 2024-04-12T17:14:18.128Z | municipal-study.xlsx | ingesting themes from for municipal-study...
build: 2024-04-12T17:14:18.327Z | municipal-study.xlsx | ingested 1 themes from metrics metadata
build: 2024-04-12T17:14:18.327Z | municipal-study.xlsx | ingesting metrics metadata for municipal-study...
build: 2024-04-12T17:14:18.539Z | municipal-study.xlsx | ingested 107 metrics metadata records
build: 2024-04-12T17:14:18.540Z | municipal-study.xlsx | ingesting theme_scenario records for municipal-study...
build: 2024-04-12T17:14:18.732Z | municipal-study.xlsx | ingested 13 theme_scenario records
build: 2024-04-12T17:14:18.732Z | municipal-study.xlsx | processing geometries for municipal-study...
build: 2024-04-12T17:14:18.905Z | municipal-study.xlsx | ingesting 308 geometries, joining the geometry "NAME_2" field to the metrics "nombre" field...
build: 2024-04-12T17:14:30.994Z | municipal-study.xlsx | ignoring geometry with key "Praia da Vitória", no related metrics in metrics table (closest metric that we could find was "Vila da Praia da Vitória")
build: 2024-04-12T17:15:07.091Z | municipal-study.xlsx | failed to insert geometry with key "Lagoa", already exists in the geometries table
build: 2024-04-12T17:15:28.807Z | municipal-study.xlsx | failed to insert geometry with key "Calheta", already exists in the geometries table
build: 2024-04-12T17:16:16.006Z | municipal-study.xlsx | ingested 305 geometries
build: 2024-04-12T17:16:16.006Z | municipal-study.xlsx | verifying that all metrics have corresponding geometries...
build: 2024-04-12T17:16:16.193Z | municipal-study.xlsx | ignoring 1 metrics due to missing corresponding geometries (would fail but STRICT_MODE=false). Missing geometries: Vila da Praia da Vitória
build: 2024-04-12T17:16:16.193Z | municipal-study.xlsx | deriving pre-aggregated scenario metrics for municipal-study...
build: 2024-04-12T17:16:19.412Z | municipal-study.xlsx | derived 4284 pre-aggregated scenario metrics
build: 2024-04-12T17:16:19.412Z | municipal-study.xlsx | deriving pre-aggregated scenario metrics totals for municipal-study...
build: 2024-04-12T17:16:23.205Z | municipal-study.xlsx | derived 14 pre-aggregated scenario metrics totals
build: 
build: 🌱  The seed command has been executed.

Reading the above logs, we have a geometry with NAME_2=Praia da Vitória that has no metric. We also have a metric with nombre=Vila da Praia da Vitória with no geometry. It seems reasonable that the metric row should be updated to trim Vila da from the name. As per https://github.com/developmentseed/tecnico-energy-app/issues/26, we drop any metrics that don’t have geometries.

The core issue is that we’re joining the geometry data with the metrics data on a non-ideal field.

Questions

Understand from Ricardo where the codigo value on the spreadsheet comes from as we don’t see that field in the spatial data (municipalities_gadm41_PRT_2.zip).

We have two separate sets of geometries with duplicate NAME_2 properties: Lagoa and Calheta. For Lagoa, there appears to be a town in the Azores, and then a city on the south coast. For Calheta, there appears to a region on the southeastern half of Sao Jorge island and another on the souther half of the western side of Madeira island. So these are definitely separate communities.

See the screenshot of one of the Calheta geometry properties, the metric data shows codigo=3101 for Calheta but we don’t have any property like that. image image (1) image (2)

Other Notes

@alukach highlighted data that we added in the spreadsheet in yellow and any columns/keys that we needed to rename in fuchsia image (3) image (4)

heidimok commented 3 months ago

Hi @RicardoGomesIST, this was one of the open questions for the municipal data. We were wondering where thecodigo value on the spreadsheet comes from as we don’t see that field in the spatial data (municipalities_gadm41_PRT_2.zip). Would you be able to comment on it? Feel free to ask any questions here as well as a comment and the team can clarify.

cc @alukach - please also comment if this open question is no longer relevant as I know things can change fast as we continue to make progress.

RicardoGomesIST commented 3 months ago

Hi Heidi, the code should come from the Portuguese National Statistics I believe. Why? Do you wish to use it for the data assignment? In fact, there are some Municipalities in Portugal with the same name (from mainland and Islands). I will ask Mariana to join us as she developed this dataset. Her name in Github is marianajanuario97 - can you please add her?

heidimok commented 3 months ago

Thanks @RicardoGomesIST!

@alukach given the info, would you be able to provide more guidance here on what the ask is in terms of adjusting the data in any way?

Also, I can add Mariana!

RicardoGomesIST commented 3 months ago

Hi. I corrected the municipalities' shapefile with added data in the "CC_2" column. Also, I shared with @yellowcap the "Municipal Data v6" Google sheets file with the column "Código" corrected (it had data in the number format and is now converted to text. So the code "101" is now "0101" for instance). So the columns "Código" and "CC_2" can join. Hope it helps!

Here is the shapefile updated:

municipalities_gadm41_PRT_2.zip

yellowcap commented 2 months ago

Made a copy of this into our drive. https://docs.google.com/spreadsheets/d/1Q-qWsO0N3bxbAQJu27_Zny79bGINhMvLeZY0lQdkNvQ/edit