OSGeo / PROJ

PROJ - Cartographic Projections and Coordinate Transformations Library
https://proj.org
Other
1.75k stars 789 forks source link

Build PROJ without sqlite dependencies #1552

Closed yonarw closed 4 years ago

yonarw commented 5 years ago

We are currently using PROJ v5.1.0 and want to continue using the library. We are especially interested in the new WKT parsing capabilities of version 6+. Looking at the code a bit it seems sqlite is a hard dependency when we want to be able to parse WKT.

Our question is: Would there be a possible/reasonable way of providing a variant of PROJ without sqlite that still is able to parse WKT?

We'd like to work on and contribute a possible solution for this but first wanted to ask what the community thinks about it.

rouault commented 5 years ago

sqlite is quite a ubiquitous dependency (especially for anyone working with geospatial data, with the GeoPackage or Spatialite formats for example), and not a hard one to build, so I'm not sure there's really interest in offering that option. Technically yes parsing WKT doesn't require sqlite, although there are subtle dependencies like when parsing ESRI WKT, where the database is used to replace ESRI specific names with EPSG ones (but in the case the database isn't reachable, that will still work and just don't do the name substitutions)

yonarw commented 5 years ago

sqlite is quite a ubiquitous dependency

While I agree with that, in our case we use PROJ inside of a database (SAP HANA). Having sqlite as a dependency of HANA is the thing we want to avoid.

Technically yes parsing WKT doesn't require sqlite, although there are subtle dependencies like when parsing ESRI WKT, where the database is used to replace ESRI specific names with EPSG ones (but in the case the database isn't reachable, that will still work and just don't do the name substitutions)

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

hobu commented 5 years ago

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

Not really, no. The old PROJ didn't run without its database either, but its database was a bunch of CSV files.

An interesting question is why is the SQLite dependency an issue for you?

rouault commented 5 years ago

Having sqlite as a dependency of HANA is the thing we want to avoid.

That's an easy one to hide. Just do a static build of sqlite & proj

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

It could be technically possible, but I'm not sure the project wants to maintain that. Another issue I see is that the test suite ("make check") is dependent in a lot of places on having the database available, so that would mean: either sacrifice "make check" in that degraded environment (but who knows if that works correctly then), or do a lot of changes in the test suite so that it can run in the no-sqlite3 case by skipping tests or allowing 2 possible outputs: a lot of initial work to do that, and a maintenance burden over the long term.

yonarw commented 5 years ago

An interesting question is why is the SQLite dependency an issue for you?

As a database we do not want to have dependencies to an embedded database.

... but its database was a bunch of CSV files.

I see. But AFAIK you did not need to ship the CSV files with the library pre v6 right?

Just do a static build of sqlite & proj

We would not have a lib-dependency to sqlite (although the code is still compiled into the static proj library), but still would need to deliver the .db file. I understand your point about testability. So far I only looked at the code that depends on sqlite which is mostly inside factory.cpp (?) where a implementation for DatabaseContext and other interfaces is provided. I thought one possibility could be providing an additional implementation which reads the information from CSV files instead of .db file. Tests are of course another part that needs to be considered. Thanks for your answers this far, I will discuss possible solutions with our team.

rouault commented 5 years ago

but still would need to deliver the .db file.

You could possible puts its content inside a .text entry in some DLL, and use SQLITE in-memory capabilities to work with it.

I thought one possibility could be providing an additional implementation which reads the information from CSV files instead of .db file.

SQlite could with a bit of work be replaced by another SQL92 capable database, but you need to be able to support JOIN, WHERE, ORDER, etc... A simple CSV file reader won't do.

aaronpuchert commented 5 years ago

The core issue for us is the entanglement of the actual projections (which we want) and the metadata management, which we don't want, because we're having our own.

Not really, no. The old PROJ didn't run without its database either, but its database was a bunch of CSV files.

The old libproj.so ran well enough without the CSV files that we could use it that way. When you have your own CRS table to look up transformation parameters those files were never read, and so we didn't even install them. (We have to maintain that table on our side because we have to ensure transactional integrity and other shenanigans. Needless to say that we also want to allow users to add reference systems.) We have no issue with migrating the CRS data from CSV to a SQLite database, but that it has become a hard dependency even when the data isn't needed.

There is also metadata compiled into the library: the data for the proj_list_* functions in ellps.cpp, datums.cpp, units.cpp. But we have that on our side as well.

An interesting question is why is the SQLite dependency an issue for you?

Our issue with SQLite is neither its size nor availability nor delivery. Certainly we can handle all that. The issue is about a leap in complexity, and that we need a dependency for a feature that we don't use.

The projection library that we want (and it's how we have used it in the past) is a library of plain computations that runs just on CPU and memory and doesn't know about the rest of the system. With SQLite that goes out of the window, as it does all kinds of low-level file I/O. Using a SQL database engine also adds a lot to runtime complexity, which isn't really needed here:

But we aren't concerned with how the metadata management is handled, it's rather that we handle it ourselves and thus don't want to carry another metadata manager around that we don't use.

We don't care whether it's an optional or replaceable component, which leaves two possibilities:

  1. Make it possible to build libproj.so without SQLite and without metadata, meaning that clients have to manage the data and the library becomes a pure computation engine.
  2. Abstract away the storage layer, so that it could be implemented in different ways, possibly even by clients of the library.

The first route sounds more promising to me, although the work of separating the components would need to be done in both variants. Ideally metadata management would be handled by an entirely separate library which is an optional dependency for PROJ. That library might then as well use SQLite.

Another issue I see is that the test suite ("make check") is dependent in a lot of places on having the database available.

I don't see this as a big issue, on the contrary: separating tests for the projections from tests for metadata management should lead to a cleaner structure of the entire test suite.

funchal commented 5 years ago

I agree SQLite should be optional and it would enable using PROJ in a lot of additional contexts where this dependency is not available/not acceptable.

kbevers commented 5 years ago

None of the active PROJ developers are interested in liberating PROJ from sqlite. After all, we decided to put it in there because we thought that was a good idea. We did so knowing that a small fraction of the community would not like it. The added benefit of being able to use sqlite to handle the complexity of the CRS registry was, and is, worth it to us. With that said, if the community can provide a solution that is acceptable we can't ignore that. Should anyone want to take on this task, at first a RFC with a detailed description of the proposed solution should presented to the PROJ PSC. Such an RFC has a much better change of receiving a positive vote by the PSC if it comes with a preliminary proposed implementation. It will be quite a disruptive change to the current code and we need to be assured that it is sane and posible to maintain.

funchal commented 5 years ago

@kbevers Sounds fair!

I had a closer look to check the extend of the usage of sqlite in the code, and it seems to be very well architected and fairly contained within https://github.com/OSGeo/PROJ/blob/3ae09c3ba164728e200a3b930b72ed5fc24ef6ee/src/iso19111/factory.cpp. There's around 50 different queries but only 5 are complex.

I quite like the direction @rouault was going (.text entry in some DLL), but I think the queries are simple enough that it might be possible to use in-memory data-structures rather than through SQL. Alternatively it might be possible to just make the DatabaseContext api virtual so users can implement their own alternative.

rouault commented 5 years ago

I wouldn't want the code to be bound to the existing SQL requests. Newer one might be added, or existing changed. At some point, the EPSG dataset structure will be changed (presumably end of this year), and so proj.db schema will also have to adapt, and the queries will change for sure. Having alternate implementations of DatabaseContext might be acceptable, but we'll still want SQL to be used as the input for queries and expect the alternate database to have the same layout as proj.db Anyway I still don't see the big deal of having a standalone proj.db file. PROJ practical use since the introduction of datum transform capabilities around 2000 has always be tied to using external resources like grid files, etc for non-trivial coordinate operations. If your use of PROJ is simpler than that, you'd better just extract the math of the few projection methods you need.

funchal commented 5 years ago

Cool. I understand proj.db is fine for 99% of use-cases, but it'd be nice if we could make this work for 1% of users which are either not interested on the grid files or need a self-contained build without external resources. I like your idea of using DatabaseContext.

Basically I think an implementation plan could work along the lines of:

funchal commented 5 years ago

https://github.com/OSGeo/PROJ/pull/1595 is a very rough draft which is only intended to show that the approach I proposed can potentially work.

I'd be happy to spend more time to finish and clean it up. I'd like to move the SQLite DatabaseContext to its own file so it can be conditionally compiled, add a cmake build option, write some tests, and perhaps an example of how a user can implement their own DatabaseContext interface.

Please let me know if there's interest in merging something along those lines if I were to spend more time on it.

kbevers commented 4 years ago

Hello, I would like to ask how long will it take to finish this work?

It is not likely that you will be able to build PROJ without SQLite in the future. The reasons for this has been expressed quite clearly in the above comments.

And is there anything I can do for this issue? I'm glad to make some contribution for it very much!

If you after reading the above discussion are still interested in making PROJ build without SQLite, you would have to write up a RFC describing in detail how you would do it. The RFC would have to adress all the various concerns the PROJ core contributors has expressed above and should preferably include a prototype that demonstrates a real world implementation of a PROJ that can build and be used without an SQLite dependency.

benstadin commented 4 years ago

I also just faced a problem with the embedded SQLite db: It's very slow to initialize lots of proj contexts and uses a lot of memory. My whole application used to consume about 10 MB, with Proj 6.2.1. and about 100 Proj instances the application uses about 500 MB (before somebody asks: those 100 instances are all required).

I'm about to profile a bit more. But it looks like Proj 6 initializes by reading the Db again for every instance. And the unit tests take several minutes longer (from 2 to 5 minutes).

Update: Most of the CPU time is indeed eaten up by proj_create_crs_to_crs.

rouault commented 4 years ago

It's very slow to initialize lots of proj contexts

Yes, you need to minimize the number of contexts

And the unit tests take several minutes longer (from 2 to 5 minutes).

Make sure you use sqlite >= 3.11

benstadin commented 4 years ago

That's a quick response =).

Sqlite Version is 3.28, System is a Macbook Pro 16 / Core i9. I'll investigate how to reduce the contexts. But I have a bad feeling that it would require quite a lot of work to introduce a relationship between actual worker threads in the main application and the library where the projection methods are in.

NB: There are a few leaks in proj_normalize_for_visualization().

rouault commented 4 years ago

NB: There are a few leaks in proj_normalize_for_visualization().

Reproducer ?

benstadin commented 4 years ago

Sorry, the leak was my fault (faulty copy constructor). The context creation will be challenging. I'll investigate further but even if that works somehow and reduces memory usae, it will still be consuming too much for my application. Optimally it would only initialize as long and take as much memory as required for the actual projection.

benstadin commented 4 years ago

To give some feedback, I did some further tests and considered my options for integrating proj 6 into my project. Unfortunately I had to remove proj 6 and use another library because of slow initialization, memory usage and architectural choices (e.g. (re-)using proj contexts with worker threads was cumbersome).

I second the initial request to get rid of the SQLite DB, and additionally trim down memory usage and initialization time. I'd like to use proj again in the future and will try again if there are improvements to these issues.

GrahamAsher commented 4 years ago

I would like to add my vote for getting rid of the SQLite dependency, at least optionally. I have been trying to move my product, CartoType, to proj 6 and have found it impossible for that reason. An earlier commenter said "The projection library that we want (and it's how we have used it in the past) is a library of plain computations that runs just on CPU and memory and doesn't know about the rest of the system." His comments apply to us precisely.

rouault commented 4 years ago

I would like to add my vote for getting rid of the SQLite dependency

It would require much more than a vote, but significant effort to implement and thus someone to take the lead or fund someone else to do it. The SQLite dependency enables PROJ to provide much higher value than just map projections. The reality of today and tomorrow geodetic, and also cartographic (as the frontier between both tends to blur), needs, requires to be able to deal with datum shifting, time based transformations, etc etc and rely on a database providing the necessary information.

I also question the impossibility of being able to use PROJ6 as it is today as being a lack of effort in trying. It should be abloe to create static builds with a sqlite3 static lib, and if just using proj_create() with pipeline strings, no database access should be attempted (or could probably be made to work with modest effort even if it is currently tried)

benstadin commented 4 years ago

The reality of today and tomorrow geodetic, and also cartographic (as the frontier between both tends to blur), needs, requires to be able to deal with datum shifting, time based transformations, etc etc and rely on a database providing the necessary information.

This is very reasonable and I fully agree with this statement. However I think we need to look at it from different angles:

  1. Pro SQLite: Enabling things like datum shifting
  2. Con: Avoiding SQLite due to IO usage and dependencies
  3. Con: Memory usage and initialization performance

I’m affected by issue 3. Issue 2 is potentially a non-issue when optimized. SQLite itself isn‘t a problem when integrated well.

Idea: Keep SQLite but change the behavour to initialize the data lazily. It might still require some work, but it‘s a compromise to reduce file IO for those who don‘t yet want the SQLite dependency and will allow to run Proj 6 in memory and CPU constrained environments (I‘ve a define in my appto seamlessly switch between Proj 6 and another lib; I really can‘t use Proj 6 on several targeted devices due to memory requirements and init times as outlined above).

A further optimization for issue 2 might be to embed the db and have it load from memory, entirely avoiding any further file IO. SQLite makes that rather easy.

rouault commented 4 years ago

change the behavour to initialize the data lazily

That's mostly the current behaviour. But createOperations() need to explore a lot of objects in the database (see https://proj.org/operations/operations_computation.html for how it works), hence a significant part of it is probably accessed. I'm not sure having the DB embedded in the binary will change I/O related performance, and certainly it will not have any impact on memory consumption.

benstadin commented 4 years ago

... but about 5MB memory per instance and the initialization times I see is a lot.

rouault commented 4 years ago

Re-reading this thread, it seems that for most people who don't want the SQLite dependency, just using plain cartographic projection and providing already constructed pipelines to proj_create() would be sufficient. It doesn't appear that abstracting the database interface to be able to plug another DB would be that needed.

What would be lost by building without the SQLite dependency:

Besides code changes in the library, the main impacts I see are more on the test suite where we'd have to separate tests that directly or indirectly depend on the database.

If some of the persons participating in this thread are interested in funding such effort, they can email me (even.rouault at spatialys.com)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

aaronpuchert commented 4 years ago

Re-reading this thread, it seems that for most people who don't want the SQLite dependency, just using plain cartographic projection and providing already constructed pipelines to proj_create() would be sufficient. It doesn't appear that abstracting the database interface to be able to plug another DB would be that needed.

That would be what @yonarw and I want. The limitations that you listed are what I expected.

If some of the persons participating in this thread are interested in funding such effort, they can email me (even.rouault at spatialys.com)

We obviously need to discuss with our management about this, and will reopen if we have a path forward.

paamand commented 3 years ago

Good discussion. Only thing I would add is the niche of using GDAL/PROJ4 on Android. SQLite dependency is not satisfied in cross-compilation to Android. But I could probably include the source from sqlite.org. Just not very clean imo.

GrahamAsher commented 3 years ago

My product, CartoType, uses PROJ4 on Android, iOS, Windows, Linux, Mac OS, Raspberry PI, Jetson Nano and many other platforms. It's an absolute must that PROJ4 doesn't have the SQLite dependency. Luckily I can carry on using the previous version without any difficulty, but at some point I might want to revisit this problem. If I have the leisure I might attempt my own fork.

It was a strategic error to introduce the SQLite dependency because of its great impact on runtime memory use, among other things. @rouault said 'The SQLite dependency enables PROJ to provide much higher value than just map projections.'; but actually a library that just provides map projections is exactly what many of us need, and it ought to be possible to separate that out from the other functionality, which actually reduces the net value to us.

rouault commented 3 years ago

but actually a library that just provides map projections is exactly what many of us need, and it ought to be possible to separate that out from the other functionality, which actually reduces the net value to us.

I'm going to be a bit provocative because I'm a bit annoyed by the constant whining in this thread: why should we care about what value PROJ brings to you, if you don't bring value to PROJ ? The changes of PROJ 6 were done on purpose (this was a strategic decision) because people badly needed a first class geodetic transformation library, which it is now, and gathered the needed funding to make that happen. As outlined before, I guess we could still be able to produce a cut down version that has projection support only, but that won't happen by magic. Otherwise nostalgic people can just indeed fork off and leave their lives without being tied to our crazyness :-)

GrahamAsher commented 3 years ago

Being provocative is never productive or useful. I have avoided personal remarks or slurs like 'whining' and 'fork off' and I suggest that we all do that here. My criticism, and that of others, is well-meant. We believe that we are bringing value to the project by our criticisms, and, as I and others have mentioned, we might also help with making desired changes to make PROJ more useful to everybody. I have benefited hugely over the last 18 years from criticism of my product, CartoType. While a complaint may initially cause me to feel resentful, I always try to think it over and work out how I might learn from it and improve my work.

kbevers commented 3 years ago

We believe that we are bringing value to the project by our criticisms

At this point you are not. Actual contributions, funding or code, would be bring value to the project. As has been expressed many times, none of the current active developers has any use for a SQLite-less PROJ (quite the contrary) so the changes you seek are not going to happen by asking for it over and over. Step up and provide some actual value and we can continue this discussion. Until then you can use PROJ 5 and earlier which comes without the SQLite dependency free of charge.

GrahamAsher commented 3 years ago

@kbevers We obviously disagree about what constitutes value, and you already know that I believe that constructive criticism has a value of its own. However, I'll leave it there until I have time to take another look at the code. I did in fact start to attempt a clean removal of the SQLite dependency, but it was too hard, and as you say, I can continue to use the older version.

kbevers commented 3 years ago

We obviously disagree about what constitutes value

That much is clear. It is obviously of value to you if the PROJ project take on the burden of developing a SQLite-free version of the software. You can update your commercial software package with 3 years of PROJ bug fixes and new projections and bring value to your customers. And I suspect also increase your revenue. If there's a kickback from that revenue to the project we would likely be more welcoming to the suggested change but so far there has been no indications of that from any of the inquiries we've had regarding this topic. So as far as I can tell your "constructive criticism" only leaves us with an increased maintenance burden of a feature the project doesn't really want. From the viewpoint of a maintainer of the project that is not value.

I am going to lock this discussion. I believe everything that needs to be said on this topic has been touched several times in this discussion and reiterating those arguments doesn't get us anywhere. Should someone in the future actually want to make this change happen, here's a few options that will get you of to a good start:

  1. Engage actively with the community instead of just demanding that your pain point be taken care of. Build up social capital by helping make the project better, that way we are more likely to show an interest in your particular problem. Start Here.
  2. Write an RFC that details how your proposed changes will work. Back it up with a prototype implementation and commit to maintaing your code in the future.
  3. Engage one of the contracting PROJ developers and provide them with the funding to realise your vision.