geocaml / ocaml-geojson

A library for manipulating, creating and parsing GeoJSON
https://geocaml.github.io/ocaml-geojson
Other
38 stars 10 forks source link

WIP: Added bbox field to each GeoJson type #20

Closed streetCoderr closed 2 years ago

streetCoderr commented 2 years ago

Hi @patricoferris, please can you spare time to review this. Thanks as I await your response.

Process

I went through this file and came up with the implementation of the bbox function.

I want to ask though. What does it mean for a point to be the most southwesterly point. For instance, if we have 2 points: a)[-120, -20] and b)[-80, -77], which of them is the most southwesterly point. Now, a is more westerly than b but b is more southerly than a. So how do we decide which is more south-westerly. My approach was to pick the the most westerly coordinate between both and combine with the most southerly coordinate to make up a point; in this case, [-120,-77]. Is this the expected southwesterly point. I feel so though.

patricoferris commented 2 years ago

Great investigation so far @streetCoderr ! I think there's been a little confusion though, we shouldn't have to generate the bounding box, we just need to parse it from the GeoJSON document. We could provided some sanity-checking functions to ensure the bounding box meets the RFC's requirements though, but perhaps in a follow-up PR.

And I think this is where the confusion is coming from. The remark in the RFC:

with all axes of the most southwesterly point followed by all axes of the more northeasterly point.

must be interpreted in the context of a bounding box. Consider this poorly drawn 2D geometry and box.

IMG_0984 2

In the context of some coordinate system, a box (in however many dimensions) has some notion of "bottom-left" and "top-right" points. In 2D it is easier to get a feel for that.

My approach was to pick the the most westerly coordinate between both and combine with the most southerly coordinate to make up a point; in this case, [-120,-77]. Is this the expected southwesterly point. I feel so though.

Yes exactly, if you take your example as a bounding box (in 2D) then the four corners are:

[[-120,-20],[-80,-20],[-80,-77],[-120,-77],[-120,-20]]

At which point the southwesterly and northwesterly points can be selected.


So the way I was thinking this could be implemented was by extending geometry, feature and featurecollection to support a bbox field.

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson.ml#L268

which should maybe now become a record:

type t = {
  bbox : float array option;
  geometry : Geometry.t option;
  properties : Json.t option;
}

Or something like that. And same idea applied to:

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson.ml#L210-L217

Does that make sense?

streetCoderr commented 2 years ago

Great investigation so far @streetCoderr ! I think there's been a little confusion though, we shouldn't have to generate the bounding box, we just need to parse it from the GeoJSON document. We could provided some sanity-checking functions to ensure the bounding box meets the RFC's requirements though, but perhaps in a follow-up PR.

And I think this is where the confusion is coming from. The remark in the RFC:

with all axes of the most southwesterly point followed by all axes of the more northeasterly point.

must be interpreted in the context of a bounding box. Consider this poorly drawn 2D geometry and box.

IMG_0984 2

In the context of some coordinate system, a box (in however many dimensions) has some notion of "bottom-left" and "top-right" points. In 2D it is easier to get a feel for that.

My approach was to pick the the most westerly coordinate between both and combine with the most southerly coordinate to make up a point; in this case, [-120,-77]. Is this the expected southwesterly point. I feel so though.

Yes exactly, if you take your example as a bounding box (in 2D) then the four corners are:

[[-120,-20],[-80,-20],[-80,-77],[-120,-77],[-120,-20]]

At which point the southwesterly and northwesterly points can be selected.

So the way I was thinking this could be implemented was by extending geometry, feature and featurecollection to support a bbox field.

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson.ml#L268

which should maybe now become a record:

type t = {
  bbox : float array option;
  geometry : Geometry.t option;
  properties : Json.t option;
}

Or something like that. And same idea applied to:

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson.ml#L210-L217

Does that make sense?

Wow. I love the way you explained this. It all make sense now. Thank you so much.

I will work on it now.

streetCoderr commented 2 years ago

Just thought I'd leave this comment for future try-catching

Noted. Thank you :))

streetCoderr commented 2 years ago

Hi @patricoferris , good afternoon :)) It seems like Geometry does not support the bbox field. Only Feature and FeatureCollection does. Am I wrong?

patricoferris commented 2 years ago

I think all three support it because they are all GeoJSON objects. From the RFC:

A GeoJSON object represents a Geometry, Feature, or collection of Features...

  • ...
    • A GeoJSON object MAY have a "bbox" member, the value of which MUST be a bounding box array (see Section 5).

And whenever I'm in doubt of things like this, I often go to the implementation in another language like Rust, Go, TypeScript etc. to see what they did.

patricoferris commented 2 years ago

More concretely, our GeoJSON object is represented as a variant right now so either the type t needs to change:

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson_intf.ml#L219-L222

Or each of the Feature, Collection and Geometry need to change. I'm in favour of the former I think, let me know your thoughts :))

streetCoderr commented 2 years ago

I think all three support it because they are all GeoJSON objects. From the RFC:

A GeoJSON object represents a Geometry, Feature, or collection of Features...

  • ...
  • A GeoJSON object MAY have a "bbox" member, the value of which MUST be a bounding box array (see Section 5).

And whenever I'm in doubt of things like this, I often go to the implementation in another language like Rust, Go, TypeScript etc. to see what they did.

Oh... Thank you, Sir. Very useful tips on how to go about problems πŸ‘

streetCoderr commented 2 years ago

More concretely, our GeoJSON object is represented as a variant right now so either the type t needs to change:

https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson_intf.ml#L219-L222

Or each of the Feature, Collection and Geometry need to change. I'm in favour of the former I think, let me know your thoughts :))

Maybe it will be better to go through the former approach which you suggested but I think it will be easier to follow the later approach. Changing each of those GeoJson types helps to break down the problem in pieces so that I can focus on one at a time. The biggest adavantage is that once I have a piece(A given type) working, I won't need to touch other parts of the codebase where it was used as it would tally with every other variant of its type. So this approach is more like, get it working at the base level, get it working everywhere. However, based on my observation(which of course might be wrong), if I make use of the former approach, I will have to keep going around the codebase to fix things where they got wrong and most importantly, there won't be a uniformity of types as the various variants of types won't tally with their base types.

I would like to hear more on this from you Sir as this would make me better at deciding what approach is optimal for solving a given problem.

streetCoderr commented 2 years ago

I have made changes to each of the GeoJSON types and they seem to be not so bad. I'm currently working on the Random module as it got broken because of the changes I made. Should I push the changes I have made to the PR while I continue to work on the Random module?

patricoferris commented 2 years ago

Changing each of those GeoJson types helps to break down the problem in pieces so that I can focus on one at a time. The biggest advantage is that once I have a piece(A given type) working, I won't need to touch other parts of the codebase where it was used as it would tally with every other variant of its type. So this approach is more like, get it working at the base level, get it working everywhere.

I'm not sure I completely follow this line of reasoning but perhaps it'll become clearer when you push some code (pushing small self-contained commits is fine). My gut instinct is that there are shared concepts between the three kinds of GeoJSON object and so lifting those out of the kind itself into a shared type might reduce the amount of code. For example a function to extract bbox, is that needed three times for Geometry, Feature and Feature.Collection in your approach?

In my mind I had thought maybe something like the following would work:

type geojson =
  | Feature of Feature.t
  | FeatureCollection of Feature.Collection.t
  | Geometry of Geometry.t

and t = { geojson : geojson; bbox : float array option }

I didn't think too hard about the names. But the changes needed elsewhere are not too significant, for example:

diff --git a/test/geojson/test.ml b/test/geojson/test.ml
index 09ce30c..a512047 100644
--- a/test/geojson/test.ml
+++ b/test/geojson/test.ml
@@ -50,8 +50,8 @@ let _get_all_props s =
   let json = Ezjsonm.value_from_string s in
   let geo = Geojson.of_json json in
   match geo with
-  | Ok (Feature f) -> Option.to_list @@ Geojson.Feature.properties f
-  | Ok (FeatureCollection fc) ->
+  | Ok { geojson = Feature f; _ } -> Option.to_list @@ Geojson.Feature.properties f
+  | Ok { geojson = FeatureCollection fc; _ } ->

Do note that this is an unreleased library with no libraries depending on it so making big changes to the interface is completely fine if it is an improvement. In fact, this is the easiest time to do so :))

streetCoderr commented 2 years ago

I think I should push my changes so that you can see what I did

streetCoderr commented 2 years ago

Changing each of those GeoJson types helps to break down the problem in pieces so that I can focus on one at a time. The biggest advantage is that once I have a piece(A given type) working, I won't need to touch other parts of the codebase where it was used as it would tally with every other variant of its type. So this approach is more like, get it working at the base level, get it working everywhere.

I'm not sure I completely follow this line of reasoning but perhaps it'll become clearer when you push some code (pushing small self-contained commits is fine). My gut instinct is that there are shared concepts between the three kinds of GeoJSON object and so lifting those out of the kind itself into a shared type might reduce the amount of code. For example a function to extract bbox, is that needed three times for Geometry, Feature and Feature.Collection in your approach?

In my mind I had thought maybe something like the following would work:

type geojson =
  | Feature of Feature.t
  | FeatureCollection of Feature.Collection.t
  | Geometry of Geometry.t

and t = { geojson : geojson; bbox : float array option }

I didn't think too hard about the names. But the changes needed elsewhere are not too significant, for example:

diff --git a/test/geojson/test.ml b/test/geojson/test.ml
index 09ce30c..a512047 100644
--- a/test/geojson/test.ml
+++ b/test/geojson/test.ml
@@ -50,8 +50,8 @@ let _get_all_props s =
   let json = Ezjsonm.value_from_string s in
   let geo = Geojson.of_json json in
   match geo with
-  | Ok (Feature f) -> Option.to_list @@ Geojson.Feature.properties f
-  | Ok (FeatureCollection fc) ->
+  | Ok { geojson = Feature f; _ } -> Option.to_list @@ Geojson.Feature.properties f
+  | Ok { geojson = FeatureCollection fc; _ } ->

Do note that this is an unreleased library with no libraries depending on it so making big changes to the interface is completely fine if it is an improvement. In fact, this is the easiest time to do so :))

Your gut instinct is right Sir. The bbox function is above the Geometry module where it can be shared among Geometry, Feature and FeatureCollection.

streetCoderr commented 2 years ago

I await your feedback Sir on the recent changes I made. I'm currently working on the Random module. I'm yet to fully grasp it but I will get there.

streetCoderr commented 2 years ago

Glad you took your time to make things clear and provide a sound review. Thank you. I will work on the suggested changes as soon as possible.

streetCoderr commented 2 years ago

Hi @patricoferris, good afternoon Sir. I'm sorry I could not communicate here yesterday. My system went bad and I had to quickly go to an engineer for repairs. I got it in good shape later in the evening :))

streetCoderr commented 2 years ago

More concretely, our GeoJSON object is represented as a variant right now so either the type t needs to change: https://github.com/geocaml/ocaml-geojson/blob/9eaaa54fa433bf732069e3e4623f9fd29c38b754/src/geojson/geojson_intf.ml#L219-L222

Or each of the Feature, Collection and Geometry need to change. I'm in favour of the former I think, let me know your thoughts :))

Maybe it will be better to go through the former approach which you suggested but I think it will be easier to follow the later approach. Changing each of those GeoJson types helps to break down the problem in pieces so that I can focus on one at a time. The biggest adavantage is that once I have a piece(A given type) working, I won't need to touch other parts of the codebase where it was used as it would tally with every other variant of its type. So this approach is more like, get it working at the base level, get it working everywhere. However, based on my observation(which of course might be wrong), if I make use of the former approach, I will have to keep going around the codebase to fix things where they got wrong and most importantly, there won't be a uniformity of types as the various variants of types won't tally with their base types.

I would like to hear more on this from you Sir as this would make me better at deciding what approach is optimal for solving a given problem.

I would like to start by saying that my line of reasoning here was entirely wrong. The problem got way easier when I took the approach you recommended. I did not take that approach at first because I did not really know how to go about it that way, but on analyzing the problem more keenly and getting my hands dirty, I realized that it was the best way. When I used the method that requires changing each of the Geojson types, the resulting breaks in the codes were so much on almost every end to the point that I almost touched the entire codebase. However, on using the approach you recommended, I needed to just make few changes on the geojson.ml file and then, follow it up on other files where appropriate.

I learned a lot from this issue and it changed the way I think on approaching problems and solving them. Thank you so much for the opportunity to work on this issue :))

streetCoderr commented 2 years ago

Also, please do let me know what I would have done better and point out any bad software engineering practice I exhibited.

streetCoderr commented 2 years ago

Thanks @streetCoderr.

I'm still trying to work out the best approach. The issue is that what would an end user assume Geojson.Point.of_json does? Should it read the bounding-box data? At the minute only the toplevel Geojson.of_json function does. This is probably actually correct but then I don't think we should expose the Json.Conv for all of the geometries because it is confusing as to what they do. Let me know your thoughts.

It would also be good to add some tests for the bounding-box.

That's true Sir. An end user would expect GeoJson.Point.of_Json to also read the bounding-box data because it is part of the Json. What I think is that we should abstract away the base definition from the end user and make them use the top level definition. Like you pointed out, this means I would have to hide Json.Conv from all geometries. of_Json would then be polymorphic even from the end user's standpoint, in the sense that the user won't need to know which GeoJson type they are calling it on, yet, be guaranteed that it would produce the right result.

Please, I would like to know your take on this Sir. If you agree with this, I can hide Json.Conv where they don't matter. Thank you as I await your response.

patricoferris commented 2 years ago

Nicely summarised, I think that sounds good to me. It'll become clearer when you get a chance to push a commit :))

streetCoderr commented 2 years ago

Nicely summarised, I think that sounds good to me. It'll become clearer when you get a chance to push a commit :))

Thank you for the review Sir. I have tried hiding Json.Conv from all geometry types as we discussed. However it resulted to a lot of breaks in the code. Turns out that a huge part of the codebase from different files depend on that.

I'm currently trying to find another way around it. I will be back soon. Thank you

streetCoderr commented 2 years ago

Hi sir @patricoferris, To prevent the code from breaking and maintaining expected behaviour of the entire codebase, I thought of some way. I don't know if it would make sense to you. How about we make it like this. Let's have a function base_of_json for each GeoJson type that will be exposed to the user. What does this do? This function reads the very basic fields of a geojson object from a given json, i.e, it does not read any optional fields. For instance, Geojson.Point.base_of_json will return the basic field of a Point which is its coordinates. Geojson.Feature.base_of_json will return the basic field of a Feature, which is its geometry and properties.

Why are we doing this? What is the use-case? I feel there might be instances where a user might just want to get the coordinates of a given geometry. In that case there is no point reading the bounding-box. The user can therefore, call base_of_json on the given geometry. If the user needs to read all fields from the json, they can call the top level's of_json for that.

I don't know if this makes sense Sir, but I came up with this to prevent the codes from breaking as it was getting so tough to fix. Do let me know your thought sir. Thank you. I will push what I did now.

streetCoderr commented 2 years ago

Please help me review this Sir. Thank you.

streetCoderr commented 2 years ago

Hi @patricoferris, please help me review my code. I want to write the tests but you still haven't approved of this approach yet. I wanted to be sure you have approved of it before I begin writing the test.

Or should I go ahead with the test?

Also, please do point out where I got things wrong. Thank you Sir 😊😊

patricoferris commented 2 years ago

Thanks for the hard work @streetCoderr :))!

I think my comments aren't clear enough so I went ahead and built on top of your work in this commit https://github.com/geocaml/ocaml-geojson/commit/fb140881a6e16390be5070aa018a1f23e74c4bc9 to show what I mean by:

I'm hoping this will help clear things up, feel free to cherry-pick the commit if you agree with the approach. I'm also happy to hear your thoughts and suggestions if you think it isn't right too :)) !

streetCoderr commented 2 years ago

Thanks for the hard work @streetCoderr :))!

I think my comments aren't clear enough so I went ahead and built on top of your work in this commit fb14088 to show what I mean by:

  • Removing Json_conv from the geometry types because I think it will too confusing to explain what the "fundamental" fields are for each one.
  • And turning bbox into a float array option instead of the float array option option that is in this code currently.

I'm hoping this will help clear things up, feel free to cherry-pick the commit if you agree with the approach. I'm also happy to hear your thoughts and suggestions if you think it isn't right too :)) !

Good morning @patricoferris. Thank you for your positive feedback. And yes, this actually cleared things up😊😊.

I also did some extra 'massaging' to fix the errors that came up after making the suggested changes. Please help me check it out and let me know if it's okay. Thank you.

streetCoderr commented 2 years ago

Excellent ! Few little nitpicks and then we can merge I think! Thank you for persisting with this PR :))

Thank you for patiently following up on this PR too😊😊

Also, please, help me look into what I just pushed. I'm done with implementing the suggested changes.

streetCoderr commented 2 years ago

Hi @patricoferris. I abstracted away type t from the user in the most recent commit. Please check it out. Thank you.

streetCoderr commented 2 years ago

Hi @patricoferris. Please help me review. I'm done with the suggested changes

streetCoderr commented 2 years ago

Great! If you could just undo the changes to the index.js file (I will fix that problem later) this is good to go.

Alright, I just did that now. Thanks

streetCoderr commented 2 years ago

Hi @patricoferris; can you help me check why this is failing? It's running totally fine on my system.

Edit: I get now. It's because of the newly merged PRs. Anything I can do about this?

main_run

patricoferris commented 2 years ago

You will need to merge the changes in or rebase this PR onto the main branch. This will require changing the new tests because you've changed the API in this PR.

streetCoderr commented 2 years ago

You will need to merge the changes in or rebase this PR onto the main branch. This will require changing the new tests because you've changed the API in this PR.

I have made the required changes. Please review. Thank youπŸ™ŒπŸ™Œ

streetCoderr commented 2 years ago

Hi @patricoferris. Just looking at this and I'm seeing that it has 30 commits which is a bit too much. What do you think about this? Should I squash the commits so that it won't mess up our main git history? I personally feel this will be nice. Please do let me know your take on this. Thank you.

patricoferris commented 2 years ago

Squashing some of it would be great, or I can squash it all into a single commit. It depends if you have a grouping of commits in mind or not :))

streetCoderr commented 2 years ago

Squashing some of it would be great, or I can squash it all into a single commit. It depends if you have a grouping of commits in mind or not :))

I think I would rather leave it to you to do the squashing then. Thank you😊😊