datagouv / edigeo-parser

Blazing fast parser for EDIGEO files
MIT License
8 stars 3 forks source link

Parser issue for some features, in particular communes #22

Open ThomasG77 opened 2 years ago

ThomasG77 commented 2 years ago

Some communes are not parsed correctly (as least for the commune polygon). We confirmed it using another parser, GDAL with Edigeo driver.

You can see how to reproduce the issue https://gist.github.com/ThomasG77/f75f50356d50b9e428dc01c076f6574a

Only 49 communes are concerned but we've seen other type of layers are affected like TLINE

We probably need to combine approach between current parser and GDAL Edigeo Driver as current parser was done to bypass some GDAL limitations https://blog.geo.data.gouv.fr/cadastre-millesime-janvier-2018-nouveautes-perspectives-a657d471a178

Look also at https://github.com/DoFabien/edigeoToGeojson

ThomasG77 commented 2 years ago

Solved with some GDAL post-processing for communes https://twitter.com/datagouvfr/status/1521067883022979072 instead of touching parser.

Same issues for sections but currently unsolved.

It affects the DVF application as there is an empty section https://www.data.gouv.fr/fr/datasets/demandes-de-valeurs-foncieres-geolocalisees/#discussion-627b860b8ac61b099fc46a86 So, it make "Section Cadastrale" dropdown list part no showing the section (as not available) and the display does not show the section also.

Done by comparing current GeoJSON output https://cadastre.data.gouv.fr/data/etalab-cadastre/2022-04-01/geojson/communes/80/80695/ with output from https://cadastre.data.gouv.fr/data/dgfip-pci-vecteur/2022-04-01/edigeo/feuilles/80/80695/ after using GDAL on the THF file within edigeo-806950000D01.tar.bz2 with command ogr2ogr -f GeoJSON section-d.geojson -dialect SQLite -sql "SELECT * FROM SECTION_id" -lco RFC7946=YES E0000D01.THF

Recipe

wget https://cadastre.data.gouv.fr/data/dgfip-pci-vecteur/2022-04-01/edigeo/feuilles/80/80695/edigeo-806950000D01.tar.bz2
unp edigeo-806950000D01.tar.bz2
ogr2ogr -f GeoJSON section-d.geojson -dialect SQLite -sql "SELECT * FROM SECTION_id" -lco RFC7946=YES E0000D01.THF

We are able to find out issues by parsing the output logs of the edigeo-parser with paste <(cut -c1-5 nohup.out) <(cut -c14- nohup.out) |grep SECTION | sort | uniq

We got a feedback about Saint-Just-Luzac (INSEE 17351) where we got the same issue...

nohup.out is a file produced by running in background the following processing https://github.com/etalab/cadastre#extraction-des-donn%C3%A9es-du-pci-vecteur-et-production-des-fichiers-communaux

ThomasG77 commented 2 years ago

For fixing sections

ThomasG77 commented 1 year ago

To fix issues, I've taken the approach to solve by type of errors. At the moment, for sections, 6 types of errors

  1. has-crossing-holes
  2. has-exterior-holes
  3. has-self-intersection
  4. ring-has-duplicate-vertices
  5. (The input polygon may not have duplicate vertices (except for the first and last vertex of each ring)) It seems a side effect of a turf check. The message is not available in the core code of the lib but in turf code
  6. (Unable to build valid polygon coordinates)

For has-exterior-holes, I've solved it mainly with branch https://github.com/etalab/edigeo-parser/tree/fix-section-reading-1 but it seems some cases are not fixed. Then, I use the cadastre branch to see in production if effective https://github.com/etalab/cadastre/tree/update-pkg

Matching tests cases (number matches with above list number)

# 1
08339000ZV01:Objet_567765(SECTION) => geometry ignored (has-crossing-holes)
15231000ZB01:Objet_1442135(SECTION) => geometry ignored (has-crossing-holes)

# 2
571510002201:Objet_649933(SECTION) => geometry ignored (has-exterior-holes, has-self-intersection)
571510002202:Objet_649933(SECTION) => geometry ignored (has-exterior-holes, has-self-intersection)

# 3
52432111ZK01:Objet_675114(SECTION) => geometry ignored (ring-has-duplicate-vertices, has-self-intersection)
571510002201:Objet_649933(SECTION) => geometry ignored (has-exterior-holes, has-self-intersection)

# 4
52432111ZK01:Objet_675114(SECTION) => geometry ignored (ring-has-duplicate-vertices, has-self-intersection)
577320003301:Objet_1479958(SECTION) => geometry ignored (ring-has-duplicate-vertices, has-self-intersection)

#5
274485100B01:Objet_463563(SECTION) => geometry ignored (The input polygon may not have duplicate vertices (except for the first and last vertex of each ring))

#6
395110000U01:Objet_1314984(SECTION) => geometry ignored (Unable to build valid polygon coordinates)
ThomasG77 commented 1 year ago

Related issues with parcelles (parsing issues, hence not visible and not provided in our etalab-cadastre delivery)

ThomasG77 commented 1 year ago

Overall issues list (including polygons, labels, linestring)

errors count %
010100000A01:Objet_2512020(TLINE) => geometry ignored (Too many linked arcs to build a single LineString) 2429550 91,9608666210689
06088000OL01:Objet_126251(SUBDFISC) => geometry ignored (Unable to build valid polygon coordinates) 107689 4,07613499024769
Impossible de relier la subdivision fiscale à sa parcelle 84167 3,18580406284929
ring-has-duplicate-vertices 15030 0,568900341756566
has-exterior-holes 2303 0,087170824156046
Impossible de relier parcelle et numéro de voie 1204 0,0455725889204861
Failed to deintersect polygon: significant secondary polygon 977 0,0369804147635506
The input polygon may not have duplicate vertices 424 0,0160488186896064
has-crossing-holes 282 0,0106739784680873
deintersectPolygon: unexpected error 180 0,00681317774558762
Too many linked faces to build a single Polygon 62 0,00234676122348018
arc.left.endsWith is not a function 45 0,0017032944363969
found non-noded intersection between 10 0,000378509874754868
JSTS union has failed: retrying with mapshaper 8 0,000302807899803894
Missing required files in EDIGÉO bundle 8 0,000302807899803894
ThomasG77 commented 8 months ago

Exemple nouveau de problème parcelle ZB 170 recouvrant ZB 170 sur la commune 14191 Sélection_990