Closed nicholas-ys-tan closed 2 weeks ago
We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)
Thanks @brendan-ward ,
Have added your changes and test, will just need to do this one tomorrow:
We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)
I am currently investigating this:
We should probably also test this for appending to an existing file (not entirely sure how that will work in that case, because then we don't create the layer object with constructed CRS object ?)
There seems to be an issue unrelated to this ticket - I was testing the appending to a .kml file, and it introduces a Z-dimension. I am not sure if it is known behaviour but I couldn't find an existing ticket about it. Interestingly, it has nothing to do with append
but happens every time a .kml
file is over-written.
points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file.kml'
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)
gdf_read = read_dataframe(output_path, use_arrow=True)
print(gdf_read.geometry)
0 POINT (10 20)
1 POINT (30 40)
2 POINT (50 60)
Name: geometry, dtype: geometry
points_append = [Point(70, 80), Point(90, 100), Point(110, 120)]
gdf_append = gpd.GeoDataFrame(geometry=points_append, crs="EPSG:4326")
write_dataframe(
gdf_append, output_path, layer='tmp_layer', driver="KML", use_arrow=True, append=True
)
gdf_read_appended = read_dataframe(output_path, use_arrow=True)
print(gdf_read_appended.geometry)
0 POINT Z (10 20 0)
1 POINT Z (30 40 0)
2 POINT Z (50 60 0)
3 POINT Z (70 80 0)
4 POINT Z (90 100 0)
5 POINT Z (110 120 0)
Name: geometry, dtype: geometry
This doesn't seem to be related to the append
feature, but to over-writing a kml file.
points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file2.kml'
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)
print(gdf_read.geometry)
0 POINT Z (10 20 0)
1 POINT Z (30 40 0)
2 POINT Z (50 60 0)
Name: geometry, dtype: geometry
On looking into the KML, can confirm it gets restructured
Before:
<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Folder><name>tmp_layer</name>
<Placemark>
<Point><coordinates>10,20</coordinates></Point>
</Placemark>
<Placemark>
<Point><coordinates>30,40</coordinates></Point>
</Placemark>
<Placemark>
<Point><coordinates>50,60</coordinates></Point>
</Placemark>
</Folder>
</Document></kml>
After over-writing with same data:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Document id="tmp_layer">
<name>tmp_layer</name>
<Placemark id="tmp_layer.1">
<Point>
<coordinates>
10,20,0
</coordinates>
</Point>
</Placemark>
<Placemark id="tmp_layer.2">
<Point>
<coordinates>
30,40,0
</coordinates>
</Point>
</Placemark>
<Placemark id="tmp_layer.3">
<Point>
<coordinates>
50,60,0
</coordinates>
</Point>
</Placemark>
</Document>
</Document>
</kml>
This happens in the main branch too, but with the reverse co-ordinates.
I assume this is not something to be addressed within this ticket. I haven't yet confirmed if this is an upstream bug but I am currently assuming it is. As a work-around, I've modified the test for the appended KML file to only compare the xy-coordinates by reading with force_2d=True
.
If this isn't already a known issue, I'll do some digging as to why this is happening.
the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.
the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.
Not sure why this leads to appending not working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.
Not sure why this leads to appending nog working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.
Oh that is interesting, one thing I did note but neglected to mention in my above text about the Z-dimension is that when I run write_dataframe
with LIBKML, it always writes out the Z-dimension even on first write.
This seems to suggest when there is an existing KML file, LIBKML is used to over-write the KML file, hence why the Z-dimension is introduced, and why it is failing in ubuntu-small.
points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file7.kml'
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="LIBKML", use_arrow=True
)
gdf_read = read_dataframe(output_path, use_arrow=True)
print(gdf_read.geometry)
0 POINT Z (10 20 0)
1 POINT Z (30 40 0)
2 POINT Z (50 60 0)
Name: geometry, dtype: geometry
and outputs the same KML format as the over-written file from my earlier comment:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document id="root_doc">
<Document id="tmp_layer">
<name>tmp_layer</name>
<Placemark id="tmp_layer.1">
<Point>
<coordinates>
10,20,0
</coordinates>
</Point>
</Placemark>
<Placemark id="tmp_layer.2">
<Point>
<coordinates>
30,40,0
</coordinates>
</Point>
</Placemark>
<Placemark id="tmp_layer.3">
<Point>
<coordinates>
50,60,0
</coordinates>
</Point>
</Placemark>
</Document>
</Document>
</kml>
I'll investigate further to confirm tomorrow morning.
It looks like always adding the Z value when using LIBKML
is an artifact of that particular driver and also possibly because Z value is expected by the KML spec? Whereas the KML
driver does not automatically add Z values.
Can you log a separate issue that appending a KML file also adds Z value when using the KML
driver? Then we can look into that separately from this PR. I'm wondering if in the case of append GDAL switches to using the LIBKML
driver if available.
the ubuntu-small CIs don't seem to like to open the produced KML files for appending - will need to look more into it this weekend.
Not sure why this leads to appending not working... but FYI: apparently ubuntu-small doesn't contain the LIBKML driver, so according to de doc kml handling will fallback to the KML driver in that case.
The KML driver might just not support appending? In that case you will have to skip the test (or the second part of the test) when LIBKML driver is not available (you can use something like "LIBKML" not in pyogrio.list_drivers()
)
BTW, the error message we give is a bit confusing. We are adding the suggestion "It might help to specify the correct driver explicitly by prefixing the file path with '
The KML driver might just not support appending? In that case you will have to skip the test (or the second part of the test) when LIBKML driver is not available (you can use something like
"LIBKML" not in pyogrio.list_drivers()
)
Thanks @jorisvandenbossche , I think it's not just appending but also over-writing. It seems to use LIBKML if over-writing an existing file regardless of append=True
which is quite interesting.
Thanks for the pointer, I was looking for how pyogrio is able to check what drivers are available. Will push the fixes shortly.
points = [Point(10, 20), Point(30, 40), Point(50, 60)]
gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
output_path = r'/home/nicholas/dev/data/ogr_test/temporary_kml_file2.kml'
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)
write_dataframe(
gdf, output_path, layer="tmp_layer", driver="KML", use_arrow=True
)
print(gdf_read.geometry)
0 POINT Z (10 20 0)
1 POINT Z (30 40 0)
2 POINT Z (50 60 0)
Name: geometry, dtype: geometry
I think it's not just appending but also over-writing. It seems to use LIBKML if over-writing an existing file regardless of
append=True
which is quite interesting.
Yes, but this test it's the append case that matters (so that's the reason I mentioned it explicitly).
It seems that the KML driver essentially doesn't allow to open a file in Update mode (although I don't directly see anything in the GDAL code to support that), and when you are appending or writing to an existing file, in both cases our code in ogr_open
will call GDALOpenEx
with the GDAL_OF_UPDATE
flag (when reading, we will call that with GDAL_OF_READONLY
flag, that seems to be the main difference). Only in case of overwriting, we will then delete the layer if we could open the file and the layer exists, while in case of append=True
we don't do that.
Yes, but this test it's the append case that matters (so that's the reason I mentioned it explicitly). It seems that the KML driver essentially doesn't allow to open a file in Update mode (although I don't directly see anything in the GDAL code to support that), and when you are appending or writing to an existing file, in both cases our code in
ogr_open
will callGDALOpenEx
with theGDAL_OF_UPDATE
flag (when reading, we will call that withGDAL_OF_READONLY
flag, that seems to be the main difference). Only in case of overwriting, we will then delete the layer if we could open the file and the layer exists, while in case ofappend=True
we don't do that.
Thanks Joris, that all makes sense - I'll include this information in a separate ticket to document.
With this fix in, shall we do a 0.8.1 release? (this seems like an important fix to get out for when geopandas switches to use pyogrio by default)
Closes #420
My cython is atrocious.
Followed what was done in fiona, qgis and gdal documentation to find this is probably what needs to be done. But I am probably doing it too broadly and may have unintended consequences in other filetypes.
~Still need to write tests.~
Need to investigate how this may impact other file types.