Toblerity / Fiona

Fiona reads and writes geographic data files
https://fiona.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.15k stars 202 forks source link

Field names mixed up writing via "KML" driver #916

Open wheeled opened 4 years ago

wheeled commented 4 years ago

Expected behavior and actual behavior.

Attempting to save a GeoPandas dataframe using:

        with fiona.Env():
            gdf.to_file(kml_path, driver='KML', NameField='name')

produces a KML file in which the value for <name> is taken from another field, and likewise <description>. If the file is written the hard way (using ogr) it is correct. Perhaps there are some kwargs that need to be passed along to correct this? In the example above NameField='name' was an unsuccessful attempt.

Correct KML using ogr:

<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Schema name="walk" id="walk">
    <SimpleField name="id" type="string"></SimpleField>
    <SimpleField name="date" type="string"></SimpleField>
    <SimpleField name="distance" type="string"></SimpleField>
    <SimpleField name="speed" type="string"></SimpleField>
    <SimpleField name="time" type="string"></SimpleField>
    <SimpleField name="activity_name" type="string"></SimpleField>
</Schema>
<Folder><name>walk</name>
  <Placemark>
    <name>Walked 3.97 km on 22/1/17</name>
    <Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
    <ExtendedData><SchemaData schemaUrl="#walk">
        <SimpleData name="id">1965703340</SimpleData>
        <SimpleData name="date">01/22/2017</SimpleData>
        <SimpleData name="distance">3.96978493824</SimpleData>
        <SimpleData name="speed">1.397811597971831</SimpleData>
        <SimpleData name="time">2840</SimpleData>
        <SimpleData name="activity_name">Walk</SimpleData>
    </SchemaData></ExtendedData>
      <LineString><coordinates>151.165393442,-33.7688339688 ... 151.16551338,-33.7687453898</coordinates></LineString>
  </Placemark> ...

Incorrect KML using Fiona (see Placemark name and description):

<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Schema name="walk" id="walk">
    <SimpleField name="id" type="string"></SimpleField>
    <SimpleField name="date" type="string"></SimpleField>
    <SimpleField name="distance" type="float"></SimpleField>
    <SimpleField name="speed" type="float"></SimpleField>
    <SimpleField name="time" type="string"></SimpleField>
    <SimpleField name="activity_name" type="string"></SimpleField>
</Schema>
<Folder><name>walk</name>
  <Placemark>
    <name>2840</name>
    <description>01/22/2017</description>
    <Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
    <ExtendedData><SchemaData schemaUrl="#walk">
        <SimpleData name="id">3.96978493824</SimpleData>
        <SimpleData name="date">Walked 3.97 km on 22/1/17</SimpleData>
        <SimpleData name="distance">1.39781159797183</SimpleData>
        <SimpleData name="speed">0</SimpleData>
    </SchemaData></ExtendedData>
      <LineString><coordinates>151.165393442,-33.7688339688 ... 151.16551338,-33.7687453898</coordinates></LineString>
  </Placemark> ...

Steps to reproduce the problem.

I don't think there was anything special about the GeoDataFrame, but can provide a script and sample data if the answer is not trivial.

Operating system

Mac OS X 10.15.4

Fiona and GDAL version and provenance

Fiona 1.8.13.post1 loaded by PyCharm GDAL 2.4.0 loaded by PyCharm

rbuffat commented 4 years ago

A working minimal example would be very helpful.

rbuffat commented 4 years ago

I just noticed the KML driver is not registered in the supported list of drivers: https://github.com/Toblerity/Fiona/blob/1.8.13.post1/fiona/drvsupport.py#L80

This does not necessarily mean that this driver does not work with Fiona. However, the KML data model is not entirely compatible with the GDAL/OGR data model. This is also mentioned in GDAL's documentation: https://gdal.org/drivers/vector/kml.html As Fiona assumes that a full OGR data model can be used, such drivers are typically excluded by Fiona.

It could be that the driver is sensible to the order in which the fields are written (I'm just guessing). The driver has the following creation options:

Creation Options
The following dataset creation options are supported:
NameField: Allows you to specify the field to use for the KML <name> element. Default value : ‘Name’
DescriptionField: Allows you to specify the field to use for the KML <description> element. Default value : ‘Description’
AltitudeMode: Allows you to specify the AltitudeMode to use for KML geometries. This will only affect 3D geometries and must be one of the valid KML options. See the relevant KML reference material for further information.

These options can be passed to fiona.open. E.g.

with fiona.open(path, mode='w', NameField='the name field name',  DescriptionField='the description field name' ... ) as c:
wheeled commented 4 years ago

Here is a simple script that can illustrate the problem. The data file workouts.gpkg is attached (will need to be unzipped).

import fiona
import geopandas as gpd
gdf = gpd.read_file("workouts.gpkg", driver="GPKG", layer="bike_ride")
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
with fiona.Env():
    gdf.to_file("bike_ride_fiona.kml", driver='KML', NameField='name')

workouts.gpkg.zip

I had found the KML driver notes you reference above, but the addition of the NameField kwarg didn't make any difference.

My suspicion would be that the OGRFeatureBuilder in ogrext.pyx is for some reason not constructing the feature for the KML driver quite right. If I construct the feature directly in ogr it looks like so (when exported to JSON):

{
    "type": "Feature",
    "geometry": {
        "type": "LineString",
        "coordinates": [[151.148538533, -33.6983890381], [151.148168283, -33.6982220137], [151.146898883, -33.6979895788], [151.146154005, -33.6978891607]]
    },
    "properties": {
        "Name": "Rode 9.91 km on 9/04/2016", "Description": None, "id": "1419923559", "date": "04/09/2016", "distance": "9.911885322240002", "name": "Rode 9.91 km on 9/04/2016", "speed": "4.644744762061856", "time": "2134", "activity_name": "Bike Ride"
    }
}

Note that ogr has added the Name and Description fields and correctly mapped the name field to Name.

I don't have a C++ environment or I'd be happy to dig deeper. Thanks for your assistance.

rbuffat commented 4 years ago

@wheeled For the above-mentioned reasons, I hope you understand that Fiona does not support KML. However, it is still strange that data can be written without error, but just the order of the fields is wrong.

It looks like the order of the fields in the schema is relevant. The following code works:

import os
from collections import OrderedDict
import fiona
import logging
import sys
from fiona.crs import from_epsg
from shapely.geometry import shape

with fiona.open('workouts.gpkg') as c:
    for f in c:
        print(shape(f['geometry']))

logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)

print(fiona.__version__)
print(fiona.get_gdal_release_name())

fiona.drvsupport.supported_drivers['KML'] = 'rw'

fname = 'test.kml'
if os.path.exists(fname):
    os.remove(fname)

schema = {'properties': OrderedDict([
    ('Name', 'str'),
    ('Description', 'str'),
    ('id', 'str'),
    ('distance', 'float'),
    ('speed', 'float'),
    ('time', 'int'),
    ('activity_name', 'str'),
    ('Description', 'str'),
    ('date', 'str'), ]),
    'geometry': 'LineString'}

with fiona.open("test.kml",
                driver='KML',
                layer="walk",
                mode='w',
                schema=schema,
                crs=from_epsg(4326)) as c:
    f = {
        'geometry': {'type': 'LineString',
                     'coordinates': [(151.148538533, -33.6983890381), (151.148168283, -33.6982220137),
                                     (151.146898883, -33.6979895788), (151.146154005, -33.6978891607)]},
        'properties': OrderedDict([
            ('Name', 'Walked 3.97 km on 22/1/17'),
            ('Description', 'test'),
            ('distance', 3.96978493824),
            ('id', '1965703340'),
            ('speed', 1.397811597971831),
            ('time', 2840),
            ('date', '01/22/2017'),
            ('activity_name', 'Walk')
        ])
    }
    c.write(f)

with open(fname) as f:
    print(f.read())

whereas if one of the ExtendData fields is in front of Name and Description in schema, the resulting order is wrong.

rbuffat commented 4 years ago

I found the issue: https://github.com/Toblerity/Fiona/blob/master/fiona/ogrext.pyx#L1119-L1122 Here it is assumed that the ordering of the schema retrieved by gdal is the same as originally specified. However, the KML driver seems to use internally a specific ordering. This code was introduced due to https://github.com/Toblerity/Fiona/issues/105 to handle normalized field names.

rbuffat commented 4 years ago

The real issue is, that the KML driver automatically adds fields: https://github.com/OSGeo/gdal/blob/master/gdal/ogr/ogrsf_frmts/kml/ogrkmllayer.cpp#L116-L120

https://github.com/Toblerity/Fiona/blob/master/fiona/ogrext.pyx#L1119-L1122 assumes that only the fields in the schema provided to Fiona exist. The same issue is also present for the DXF, GPX, GPSTrackMacker and DGN driver.

Tested by adding:

diff --git a/fiona/ogrext.pyx b/fiona/ogrext.pyx
index a0b2a6c..c8163ca 100644
--- a/fiona/ogrext.pyx
+++ b/fiona/ogrext.pyx
@@ -1083,6 +1083,12 @@ cdef class WritingSession(Session):

             encoding = self._get_internal_encoding()

+            _schema = self.get_schema()
+            log.debug("before schema: {}".format(_schema))
+
+            if len(_schema['properties']) > 0:
+                raise ValueError("Driver {} has non empty schema: {}".format(self.collection.driver, _schema))
+
             for key, value in collection.schema['properties'].items():

                 log.debug("Begin creating field: %r value: %r", key, value)
wheeled commented 4 years ago

So https://github.com/Toblerity/Fiona/blob/master/fiona/ogrext.pyx#L1119-L1122 explains how the fields get mixed up. If I simulate the behaviour of the dict(zip()) it produces a mapping that agrees with the result:

coll_schema = ["id", "date", "distance", "name", "speed", "time", "activity_name"]
ogr_schema = ["Name", "Description", "id", "date", "distance", "name", "speed", "time", "activity_name"]
schema_mapping = dict(zip(coll_schema, ogr_schema))
print(schema_mapping)

{'id': 'Name', 'date': 'Description', 'distance': 'id', 'name': 'date', 'speed': 'distance', 'time': 'name', 'activity_name': 'speed'}

[note that this process remaps time to name and then later on the ogr driver remaps name to Name]

I understand that you are not necessarily keen to support KML, but if this issue also affects some other drivers would it be possible to classify it as a bug?

I imagine the fix would be to check that the collection.schema['properties'].keys() and ogr_schema['properties'].keys() were the same length prior to zipping them. If all of the offending ogr drivers add their special fields at the beginning of the schema, this might work as a fix:

        ogr_schema = self.get_schema()['properties'].keys()
        coll_schema = collection.schema['properties'].keys()
        offset = len(ogr_schema) - len(coll_schema)
        self._schema_mapping = dict(zip(
            coll_schema,
            ogr_schema[offset:] ))

This assumes that the driver will look after its own Name and Description fields.

I apologise again that I am not set up to be able to test it.