georust / geozero

Zero-Copy reading and writing of geospatial data.
Apache License 2.0
352 stars 37 forks source link

Add support for Spatialite and MySQL WKB dialects #153

Closed Oreilles closed 1 year ago

Oreilles commented 1 year ago

This PR adds support for Spatialite and MySQL WKB dialects.

I'd happily discuss the implementation. Quirks of the spatialite dialect that required some refactoring:

Side notes:

I also refactored the roundtrip tests so they can be shared by the 5 dialects.

Oreilles commented 1 year ago

Thanks for the review, the latest commit fixes the formatting issues!

pka commented 1 year ago

Thanks! Now we just have to make clippy happy as well. To test locally, you just have to run cargo clippy or to be sure with all options as in the CI: cargo clippy --workspace --all-features --bins --examples --tests --benches.

Oreilles commented 1 year ago

I think it's good now!

nyurik commented 1 year ago

lol, seems like @pka and I were reviewing it at the same time...

Oreilles commented 1 year ago

Thanks for all the patience!! :)

Thanks for your time, it's very pleasing contributing to open source projects with responsive maintainers πŸ‘

nyurik commented 1 year ago

please fix the conflict and lets merge

Oreilles commented 1 year ago

Solved

nyurik commented 1 year ago

At first i made a few minor comments here, but then just created a PR for your PR - please merge if agree -- https://github.com/Oreilles/geozero/pull/1

Oreilles commented 1 year ago

I merge the two PR! On an unrelated side, I have a question about the processor functions xy and coordinate: since all parameter of coordinate but x and y are of type Option, would it be possible to only ever use coordinate and let the writer decide what to do with them ? My concern is that if you parsed for example a Wkb blob with XY coordinates, and process it with a WkbWriter for which you set dims = CoordDimensions::xyzm(), then the generated WKB would be invalid: its header would indicate the presence of XYZM values, but since only processor.xy would be called by the reader (since the input doesn't have ZM), the ZM coordinates bytes wouldn't be written as they should. Same goes if your processed a Wkb blob that has XYZM, with processor with dims = CoordDimensions::xy(). The header would indicate only XY coordinates, but XYZM would actually end up included. Having the readers always call processor.coordinate and let the processors decide what to do with the unnecessary (or absent) coordinate values would mitigate this issue. I picked a Wkb to Wkb example for the sake of simplicity but that behavior seems to be shared by other readers and writers as well.

pka commented 1 year ago

On an unrelated side, I have a question about the processor functions xy and coordinate: since all parameter of coordinate but x and y are of type Option, would it be possible to only ever use coordinate and let the writer decide what to do with them ?

The reason for having two methods was performance. This method is the innermost loop of any geometry operation (except for points) and passing additional optional params costs performance for the most common xy case.

My concern is that if you parsed for example a Wkb blob with XY coordinates, and process it with a WkbWriter for which you set dims = CoordDimensions::xyzm(), then the generated WKB would be invalid

Your concerns are valid, if the reader implementing the GezeroGeometry trait doesn't look at the requested dimensions. Code example for dimension handling: https://github.com/georust/geozero/blob/ece8dfe4955c2b161e8dd001431bd38cc85a8e76/geozero/src/geojson/geojson_reader.rs#L238-L251 WkbReader looks also fine, but there are maybe readers which don't respect GeomProcessor::dimensions resp. GeomProcessor::multi_dim, which would be a bug.

In your WKB xy to WKB xyzm example, I don't think that the result is invalid, you just loose the additional dimensions.

Oreilles commented 1 year ago

In your WKB xy to WKB xyzm example, I don't think that the result is invalid, you just loose the additional dimensions.

If we process a WKB that doesn't have ZM coordinates, it will only call processor.xy:

https://github.com/georust/geozero/blob/ece8dfe4955c2b161e8dd001431bd38cc85a8e76/geozero/src/wkb/wkb_reader.rs#L315-L319

And then only X and Y will be written:

https://github.com/georust/geozero/blob/ece8dfe4955c2b161e8dd001431bd38cc85a8e76/geozero/src/wkb/wkb_writer.rs#L163-L169

When a parser will try to read the WKB emitted, il will try to read four f64 values for each point (the header says so), but only two would have been written.

The reason for having two methods was performance. This method is the innermost loop of any geometry operation (except for points) and passing additional optional params costs performance for the most common xy case.

You might have tested it already (I haven't) but in this case, maybe the branching caused by having to handle four different outcomes outweight the performance gain of only calling the method with the least amout of parameters:

In any case, I'm more concerned about the bug opportunities here than performance πŸ˜…

nyurik commented 1 year ago

@Oreilles i think this one is ready to go once conflict is fixed. One thing I forgot to ask about the other PR (which affect this PR as well) -- why do you need to call processor.srid(info.srid)?; everywhere? Shouldn't the processor be created with a given srid once?

Oreilles commented 1 year ago

@nyurik calling processor.srid(info.srid) allowed for inferring the EWKT geometry srid from the WKB blob without having to manually specify it. The trait ToWkt has a to_ewkt function that have an optional srid argument for when the source Wkb doesn't have one, but you still want one in the output. If the input Wkb has an srid defined, then this parameter can be omited and the one provided by the reader will be used (which is why I had to call processor.srid(info.srid). I wrote this so that the srid provided in the to_ewkt function parameter takes predominance over the one provided by the reader:

https://github.com/georust/geozero/blob/5e92027df76e2d0a3e29495dcc51047400cb226c/geozero/src/wkt/wkt_writer.rs#L82-L85

nyurik commented 1 year ago

Awesome work, thanks!!