OSM-es / CatAtom2Osm

Tool to convert INSPIRE data sets from the Spanish Cadastre ATOM Services to OSM files.
BSD 2-Clause "Simplified" License
13 stars 6 forks source link

Fallo de codificación en prueba de parcelario vacío #100

Closed javiersanp closed 2 years ago

javiersanp commented 2 years ago
2022-02-21 14:21:07,191 - INFO - Municipio: 'Ruidera'
2022-02-21 14:21:07,191 - INFO - Comienza el procesado de '13100'
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/CatAtom2Osm/catatom2osm/__main__.py", line 195, in <module>
    run()
  File "/opt/CatAtom2Osm/catatom2osm/__main__.py", line 188, in run
    process(options)
  File "/opt/CatAtom2Osm/catatom2osm/__main__.py", line 74, in process
    CatAtom2Osm.create_and_run(a_path, o)
  File "/opt/CatAtom2Osm/catatom2osm/app.py", line 86, in create_and_run
    app.run()
  File "/opt/CatAtom2Osm/catatom2osm/app.py", line 113, in run
    self.get_parcel()
  File "/opt/CatAtom2Osm/catatom2osm/app.py", line 183, in get_parcel
    parcel_gml = self.cat.read("cadastralparcel")
  File "/opt/CatAtom2Osm/catatom2osm/catatom.py", line 205, in read
    if self.is_empty(gml_path, zip_path):
  File "/opt/CatAtom2Osm/catatom2osm/catatom.py", line 138, in is_empty
    parser.feed(text)
  File "src/lxml/parser.pxi", line 1242, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 1364, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 592, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 6
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xEE 0x41 0x22 0x3E, line 6, column 56
javiersanp commented 2 years ago

Otros parcelarios con mala codificación:

2022-02-21 14:44:01,152 - INFO - Municipio: 'Villacarrillo'
2022-02-21 14:44:01,152 - INFO - Comienza el procesado de '23095'
ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 707849, column 48
2022-02-21 14:44:01,773 - INFO - Leídos 19060 características en '23095/A.ES.SDGC.CP.23095.cadastralparcel.gml'
Agregar:   0%|                                                                                         | 0/19060 [00:00<?, ?it/s]ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 707849, column 48
2022-02-21 14:44:02,342 - ERROR - No hay parcelas                                                                                

2022-02-21 14:44:02,917 - INFO - Municipio: 'Camponaraya'
2022-02-21 14:44:02,917 - INFO - Comienza el procesado de '24036'
ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 747792, column 48
2022-02-21 14:44:03,542 - INFO - Leídos 20094 características en '24036/A.ES.SDGC.CP.24036.cadastralparcel.gml'
Agregar:   0%|                                                                                         | 0/20094 [00:00<?, ?it/s]ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 747792, column 48
2022-02-21 14:44:04,067 - ERROR - No hay parcelas                                                                                

2022-02-21 14:45:11,271 - INFO - Municipio: 'Ponferrada'
2022-02-21 14:45:11,271 - INFO - Comienza el procesado de '24118'
ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 2932916, column 48
javiersanp commented 2 years ago

También pasa con direcciones:

022-02-22 19:14:44,035 - INFO - Municipio: 'Erustes'
2022-02-22 19:14:44,035 - INFO - Comienza el procesado de '45060'
2022-02-22 19:14:44,090 - INFO - Leídos 740 características en '45060/A.ES.SDGC.CP.45060.cadastralparcel.gml'
2022-02-22 19:14:44,237 - INFO - Leídos 268 características en '45060/A.ES.SDGC.BU.45060.building.gml'                           
2022-02-22 19:14:44,240 - INFO - Leídos 9 características en '45060/A.ES.SDGC.BU.45060.otherconstruction.gml'
2022-02-22 19:14:44,876 - INFO - Leídos 866 características en '45060/A.ES.SDGC.BU.45060.buildingpart.gml'                       
2022-02-22 19:14:45,020 - INFO - Leídos 36 características en '45060/A.ES.SDGC.CP.45060.cadastralzoning.gml'                     
ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 14210, column 27                               
2022-02-22 19:14:46,140 - INFO - Leídos 354 características en '45060/A.ES.SDGC.AD.45060.gml'
Agregar:   0%|                                                                                           | 0/354 [00:00<?, ?it/s]ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 14210, column 27
2022-02-22 19:14:46,152 - INFO - No hay direcciones                                                                              
2022-02-22 19:14:46,901 - INFO - Generado 'parcel.shp'
2022-02-22 19:14:46,920 - INFO - Generado 'building.shp'
2022-02-22 19:14:46,929 - INFO - Generado 'address.osm': 0 nodos, 0 vías, 0 relaciones
2022-02-22 19:14:46,930 - INFO - Generado '45060/highway_names.csv'. Por favor, compruébelo y ejecute de nuevo
2022-02-22 19:04:38,577 - INFO - Municipio: 'Vegas de Matute'
2022-02-22 19:04:38,577 - INFO - Comienza el procesado de '40261'
2022-02-22 19:04:38,788 - INFO - Leídos 5020 características en '40261/A.ES.SDGC.CP.40261.cadastralparcel.gml'
2022-02-22 19:04:39,692 - INFO - Leídos 745 características en '40261/A.ES.SDGC.BU.40261.building.gml'                           
2022-02-22 19:04:39,696 - INFO - Leídos 19 características en '40261/A.ES.SDGC.BU.40261.otherconstruction.gml'
2022-02-22 19:04:45,176 - INFO - Leídos 1843 características en '40261/A.ES.SDGC.BU.40261.buildingpart.gml'                      
2022-02-22 19:04:45,493 - INFO - Leídos 129 características en '40261/A.ES.SDGC.CP.40261.cadastralzoning.gml'                    
ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 89630, column 38                               
2022-02-22 19:04:48,206 - INFO - Leídos 2332 características en '40261/A.ES.SDGC.AD.40261.gml'
Agregar:   0%|                                                                                          | 0/2332 [00:00<?, ?it/s]ERROR 1: XML parsing of GML file failed : not well-formed (invalid token) at line 89630, column 38
2022-02-22 19:04:48,263 - INFO - No hay direcciones                                                                              
2022-02-22 19:04:49,005 - INFO - Generado 'parcel.shp'
2022-02-22 19:04:49,042 - INFO - Generado 'building.shp'
2022-02-22 19:04:49,051 - INFO - Generado 'address.osm': 0 nodos, 0 vías, 0 relaciones
2022-02-22 19:04:49,053 - INFO - Generado '40261/highway_names.csv'. Por favor, compruébelo y ejecute de nuevo

Contenido 45060: <GN:text> CL REAL 2&</GN:text> Contenido 40261: <GN:text> UR MONTE VEGAS 1&FASE</GN:text>

javiersanp commented 2 years ago

Secuencias de escape: &amp; o &#38;.