Closed Oreilles closed 1 year ago
Thanks for the review, the latest commit fixes the formatting issues!
Thanks! Now we just have to make clippy happy as well. To test locally, you just have to run cargo clippy
or to be sure with all options as in the CI: cargo clippy --workspace --all-features --bins --examples --tests --benches
.
I think it's good now!
lol, seems like @pka and I were reviewing it at the same time...
Thanks for all the patience!! :)
Thanks for your time, it's very pleasing contributing to open source projects with responsive maintainers π
please fix the conflict and lets merge
Solved
At first i made a few minor comments here, but then just created a PR for your PR - please merge if agree -- https://github.com/Oreilles/geozero/pull/1
I merge the two PR!
On an unrelated side, I have a question about the processor functions xy
and coordinate
: since all parameter of coordinate
but x
and y
are of type Option
, would it be possible to only ever use coordinate
and let the writer decide what to do with them ?
My concern is that if you parsed for example a Wkb
blob with XY coordinates, and process it with a WkbWriter
for which you set dims = CoordDimensions::xyzm()
, then the generated WKB would be invalid: its header would indicate the presence of XYZM
values, but since only processor.xy
would be called by the reader (since the input doesn't have ZM), the ZM coordinates bytes wouldn't be written as they should.
Same goes if your processed a Wkb
blob that has XYZM, with processor with dims = CoordDimensions::xy()
. The header would indicate only XY
coordinates, but XYZM
would actually end up included.
Having the readers always call processor.coordinate
and let the processors decide what to do with the unnecessary (or absent) coordinate values would mitigate this issue.
I picked a Wkb
to Wkb
example for the sake of simplicity but that behavior seems to be shared by other readers and writers as well.
On an unrelated side, I have a question about the processor functions
xy
andcoordinate
: since all parameter ofcoordinate
butx
andy
are of typeOption
, would it be possible to only ever usecoordinate
and let the writer decide what to do with them ?
The reason for having two methods was performance. This method is the innermost loop of any geometry operation (except for points) and passing additional optional params costs performance for the most common xy case.
My concern is that if you parsed for example a Wkb blob with XY coordinates, and process it with a WkbWriter for which you set dims = CoordDimensions::xyzm(), then the generated WKB would be invalid
Your concerns are valid, if the reader implementing the GezeroGeometry
trait doesn't look at the requested dimensions.
Code example for dimension handling: https://github.com/georust/geozero/blob/ece8dfe4955c2b161e8dd001431bd38cc85a8e76/geozero/src/geojson/geojson_reader.rs#L238-L251
WkbReader
looks also fine, but there are maybe readers which don't respect GeomProcessor::dimensions
resp. GeomProcessor::multi_dim
, which would be a bug.
In your WKB xy to WKB xyzm example, I don't think that the result is invalid, you just loose the additional dimensions.
In your WKB xy to WKB xyzm example, I don't think that the result is invalid, you just loose the additional dimensions.
If we process a WKB that doesn't have ZM coordinates, it will only call processor.xy
:
And then only X and Y will be written:
When a parser will try to read the WKB emitted, il will try to read four f64
values for each point (the header says so), but only two would have been written.
The reason for having two methods was performance. This method is the innermost loop of any geometry operation (except for points) and passing additional optional params costs performance for the most common xy case.
You might have tested it already (I haven't) but in this case, maybe the branching caused by having to handle four different outcomes outweight the performance gain of only calling the method with the least amout of parameters:
In any case, I'm more concerned about the bug opportunities here than performance π
@Oreilles i think this one is ready to go once conflict is fixed. One thing I forgot to ask about the other PR (which affect this PR as well) -- why do you need to call processor.srid(info.srid)?;
everywhere? Shouldn't the processor be created with a given srid once?
@nyurik calling processor.srid(info.srid) allowed for inferring the EWKT geometry srid from the WKB blob without having to manually specify it. The trait ToWkt
has a to_ewkt
function that have an optional srid argument for when the source Wkb doesn't have one, but you still want one in the output. If the input Wkb has an srid defined, then this parameter can be omited and the one provided by the reader will be used (which is why I had to call processor.srid(info.srid)
. I wrote this so that the srid provided in the to_ewkt
function parameter takes predominance over the one provided by the reader:
Awesome work, thanks!!
This PR adds support for Spatialite and MySQL WKB dialects.
I'd happily discuss the implementation. Quirks of the spatialite dialect that required some refactoring:
0xFE
. In order to know wether thepolygon_end
(or alike) is the last one, we keep track of the collection entity nesting level. If the nesting level is 0 when a geometry ends, it means it is the last one of the blob and the terminating byte0xFE
must be appended. Even thoughMULTI*
geometries cannot be part of aGEOMETRYCOLLECTION
in Spatialite at the moment, this was my easiest go at the problem - and it also makes it more future-proof in case that feature is added at some point.read_header
parameter ofprocess_wkb_geom_n
so that the parent geometry info can be passed down to its children.f32
values, relative to the previous one (with the exception of the first and the last one), which required some refactoring toprocess_coord
to avoid code duplication between the two implementations.Side notes:
WkbWriter
implementGeomProcessor::srid()
?[minx, miny, maxx, maxy]
(like GeoJSON) whereas gpkg stores it as[minx, maxx, miny, maxy]
I also refactored the roundtrip tests so they can be shared by the 5 dialects.