JuliaGeo / Proj.jl

Julia wrapper for the PROJ cartographic projections library
MIT License
48 stars 9 forks source link

Wrap Proj's geodesics library #77

Closed asinghvi17 closed 11 months ago

asinghvi17 commented 1 year ago

This PR wraps the Proj geod_* API in a Julian API. The idea is that the user should be able to pass objects directly instead of by passing pointers as you have to do now. It also simplifies some constructors.

With the new API (example copied from the docs of geod_position),

# create a geodesic definition
geod = Proj.geod_geodesic(6378137, 1/298.257223563)
# create a line which solves the provided inverse problem
inv_line = Proj.geod_inverseline(geod, 40.64, -73.78, 1.36, 103.99)
# get positions as (lat, lon, az) which is how the C-API orders them
lats = zeros(Float64, 100)
lons = zeros(Float64, 100)

for (ind, relative_arclength) in enumerate(LinRange(0, 1, 100))
   lat, lon, _ = Proj.geod_position_relative(inv_line, relative_arclength)
   lats[ind] = lat
   lons[ind] = lon
end

jfk_to_sin

and the last part will be further simplified into a geod_path function.

Should I override the functions here, or create a new set of functions? If so, how should we name them?

asinghvi17 commented 1 year ago

As it is now, here's the performance of geod_path on my machine:

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 100);
  0.000034 seconds (3 allocations: 2.281 KiB)

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 1000);
  0.000240 seconds (3 allocations: 16.406 KiB)

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 10000);
  0.001413 seconds (5 allocations: 156.875 KiB)

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 100000);
  0.021385 seconds (5 allocations: 1.526 MiB)

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 1000000);
  0.188391 seconds (1.11 k allocations: 15.310 MiB, 14.53% gc time, 11.07% compilation time)

julia> @time Proj.geod_path(geod, 40.64, -73.78, 1.36, 103.99, 1000000);
  0.149690 seconds (5 allocations: 15.259 MiB)

Is it worth trying to optimize in order to not allocate so many Refs, by creating another dispatch for AbstractVectors?

rafaqz commented 1 year ago

As for optimisation, I've been removing excessive Ref generation elsewhere, in ArchGDAL it can be the main bottleneck currently in some operations (need to push that PR).

So +1 on removing them from the start.

visr commented 1 year ago

As for optimisation, I've been removing excessive Ref generation elsewhere

@rafaqz can you explain this, why is it slow and how to avoid it? I see here in the PR immutable structs are made mutable, and pointer_from_objref is used instead of keeping it mutable and using Ref. According to this thread pointer_from_objref should be rarely used, and Ref is the way to go. The GC preserve is done by ccall so that isn't needed. It seems that using pointer_from_objref is also more vulnerable to GC issues, from the docstring, with possible issues like this.

visr commented 1 year ago

Should I override the functions here, or create a new set of functions? If so, how should we name them?

So far the existing high-level API, like in coord.jl, is much more Julian and distinct from the libproj function names. If we can come up with a nice high-level API then perhaps that'd be best I think. Then Proj.proj_* stays just the low level auto generated interface.

Keep in mind by the way that libproj.jl is auto generated, so if you want to change it that has to be done via gen/.

rafaqz commented 1 year ago

What I mean is essentially what @jw3126 is doing in libgeos for isequals performance - define Ref outside the hot paths and reuse them to minimise allocation. Not completely avoiding Ref. That's what I thought @asinghvi17 meant by dispatch on Vectors - defining the Refs further out so they can be reused.

In ArchGDAL there are some hot paths where 3 refs are defined just to get one point. That allocation ends up totally dominating the profile when you get the points from a polygon. We need to be careful that all operations on the points of composite objects define the Refs themselves and pass them in. Doing this makes rasterizing ArchGDAL polygons 20x faster. Just loading the polygons is still slower than the actual rasterisation, but not 20x slower.

I didn't mean to use pointers instead. Refis fine as long as we carefully minimise the allocations.

But in some cases it might be worth using pointers? pointers and mutable state in structs are two things I prefer not to think about, so it would need a clear reason and benchmarks.

visr commented 1 year ago

Thanks, yeah that makes sense to me.

rafaqz commented 1 year ago

Looking at this again, it would be good if instead of lon and lat values as arguments, functions accepted GeoInterface.jl compatible points. That means (lon, lat) would work, but other point types will too.

asinghvi17 commented 1 year ago

That would probably be a pretty easy overload, but I don't think Proj depends on Geointerface yet (not sure why). I'd also like to add some utilities to generically transform geometries in that case...

rafaqz commented 1 year ago

I just wrote it ;)

https://github.com/JuliaGeo/Proj.jl/pull/79

asinghvi17 commented 1 year ago

Sorry, my finger slipped on mobile. This is still a draft but I hope to polish it up today.

asinghvi17 commented 1 year ago

btw, here are some cool applications of this: https://xkdr.github.io/Karmana.jl/dev/examples/geodesic/ https://xkdr.github.io/Karmana.jl/dev/examples/annular_ring/#:~:text=an%20encore%2C%20let%27s-,animate,-what%20the%20annular_ring

(these are from the documentation of a utility package for my current org, it probably won't be registered but the code is open source.)

asinghvi17 commented 1 year ago

Do we need more features here or should we go ahead and merge once CI passes?

I've made the portion of code that I actually use as efficient as possible, but have no idea what other people use. As those usecases shake out we could gradually make them more efficient.

asinghvi17 commented 1 year ago

Nightly is failing because of a LibCURL version mismatch

visr commented 1 year ago

@asinghvi17 thanks for taking this on! I'm not really familiar with the geodesics API, but overall this looks good to me. I pushed some minor nonfunctional commits.

We don't have to cover the whole API at once and can add what we need. With this I think we can close #66. For me this is good to merge.

rafaqz commented 11 months ago

@asinghvi17 @visr should we merge this?

evetion commented 11 months ago

If @visr approved back in May, we can merge this. We can always hold of tagging a release if something comes up.