geodesymiami / insarmaps

3 stars 0 forks source link

We are limited by download speeds when selecting reference points #104

Closed stackTom closed 1 month ago

stackTom commented 1 month ago

I've done some analyzing about various aspects of on the fly coloring. Here are some updates:

  1. Speed up by upgrading to Postgresql 16 from 9.6.
  2. Speed up by eliminating ROW_NUMBER() OVER () as seen in https://github.com/geodesymiami/insarmaps/commit/91a0d9c9b1385f5f8a3323969bff7d916bf16785. This also fixed the scrambling seen with Postgresql 16.
  3. Speed up by eliminating string concatenation in php on the server. As seen in https://github.com/geodesymiami/insarmaps/commit/97bbaecdd58558aab4b8740bce4729721187b522 and https://github.com/geodesymiami/insarmaps/commit/0e337c9bd60cd6c2d912e0c76aa59702b587ee25

However, despite all this, some on the fly queries are just slow because of the sheer amount of data needed to be downloaded over the internet. For example, this dataset: S1_IW123_069_0126_0130_20141022_20231116_N39495_N39710_E118090_E118410_filtDel4DS Is over 9 gigabytes worth of data (it has a large number of points and a very large number of dates). For regular on the fly coloring, the database calculates the linear regression. Even for huge datasets, the database takes a max of 2-3 seconds from my tests (sometimes milliseconds for smaller datasets). Although transferring the data to the web browser over the internet is sometimes an issue, as there is just a large amount of data to download... When selecting reference points, the database currently returns the raw displacements (not just the precalculated linear regressions), and the client-side javascript calculates the slope. This is a problem, as this can mean 4-700 megabytes of worth of data to transfer over the internet just for one recoloring, as can be seen in S1_IW123_069_0126_0130_20141022_20231116_N39495_N39710_E118090_E118410_filtDel4DS...

I need to make the reference point selection be something that is done server side (either on the HTTP server or on the Postgresql server), so only the new regression values are transmitted to the client.

falkamelung commented 1 month ago

Interesting! So then the issue is not the RAM of the server but the download from the server to your machine. That would make sense. Then working with a container on my Mac should be faster? I need to try.

So the original question was whether we need to make any recommendation regarding file size and memory of the server. So I suppose the answer is no? What about 50 GB and 100 GB files? Maybe the file size done not matter much once it is in the data base?

Here the instance with 60GB RAM: alias insarmaps3='ssh -YC -o ServerAliveInterval=60 exouser@149.165.172.97' (I added your key) if you want to try something.

Here the European website again: https://egms.land.copernicus.eu . But I think ours performs almost the same as long as we don't recolor.

stackTom commented 1 month ago

RAM definitely plays a role as well as processor and disk speed. All these help the postgres databse search faster as well as the http server perform calculations faster, but the download is the limiting factor in some cases, especially with reference recoloring.

I have just implemented a fix in https://github.com/geodesymiami/insarmaps/commit/6572624e4bc8274aecfb32b0b40c631db60ce6bf to move reference recoloring so that the postgresql database does the calculations, and just returns the new slopes to the browser. This should fix this issue.

Our website performs the same as the EGMS one. The coloring is what can be slow in certain huge datasets. But given all the fixes I mentioned above, as well as this one, it should be much better.

stackTom commented 1 month ago

Fixed a bug in https://github.com/geodesymiami/insarmaps/commit/6572624e4bc8274aecfb32b0b40c631db60ce6bf with https://github.com/geodesymiami/insarmaps/commit/950b765029d345dcb56b2de53b3e367e11c8d7af