ec-jrc / Thalassa

Large scale visualizations of unstructured mesh data
https://thalassa.readthedocs.io/en/latest/?badge=latest
European Union Public License 1.2
20 stars 15 forks source link

Proposal for Thalassa v0.2 #16

Closed pmav99 closed 2 years ago

pmav99 commented 3 years ago

Hi @all

After discussing with @brey I've started to implement what, at least we hope, can be the basis for version 0.2 of Thalassa. The end result looks something like this:

image

This is pretty much a complete rewrite. At the moment only the "max elevation" view has been implemented, so some work is needed before we can have feature parity with version 0.1. Nevertheless, we think that:

An additional feature that should be mentioned is that with the way the code has been structured, it is possible to display the visual components (i.e. the graphs) on jupyterlab, too, which can be particularly helpful when developing/debugging. it looks like this:

image

How to check it out?

I've pushed the code on the v0.2 branch. I've removed all the non-related code, so no binder and docker dirs etc. I've also updated the README so please check that out too.

For the record, if you have uncommitted changes in your local repo, you might find it easier to create a new git clone and a new virtualenv/conda env.

Known problems

As you will probably notice there is no WMS layer. The reason why I have removed it is the impact it had on performance. In a nutshell, holoviews seems to be doing some type of reprojection when you create an overlay that includes a WMS which seems to be really slow and which happens every time you make a change to the viewport (e.g. pan or zoom). I believe that it will be possible to avoid the slow-down but I haven't had the time to dig in deeper. For the record, this also affects Thalassa v0.1 so let's discuss this on a separate issue.

pmav99 commented 3 years ago

pinging @brey @yosoyjay @cuill @josephzhang8 @saeed-moghimi-noaa @SorooshMani-NOAA (I hope I didn't forget anyone)

any comments are welcome

saeed-moghimi-noaa commented 3 years ago

Thanks @pmav99

capability to choose time series line plots from drop down menu

SorooshMani-NOAA commented 3 years ago

Thank you @pmav99 for the changes and the detailed explanation. I haven't read the code yet, so I comment based on the high level description here and in the read me file.

WMS performance issue was a good catch. Do you think it's time to talk with PyViz developers about that as @brey has suggested?

pmav99 commented 3 years ago

parq file

I am not familiar with this format, any links? Unless you mean https://parquet.apache.org/

saeed-moghimi-noaa commented 3 years ago

@pmav99

I think you are right: https://docs.dask.org/en/latest/generated/dask.dataframe.read_parquet.html

saeed-moghimi-noaa commented 3 years ago

@pmav99 Also see here: https://github.com/holoviz/spatialpandas/issues/1

saeed-moghimi-noaa commented 3 years ago

All,

I think there would be great if we can meet with @jbednar. I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Thanks, -Saeed

See this one as well. https://summit.dask.org/schedule/presentation/22/scaling-geospatial-vector-data/

brey commented 3 years ago

I agree that at this stage and before we settle into the new paradigm that the new approach on Thalassa presents it would be great if you can engage the upstream community for more info in order to fix ideas.

jbednar commented 3 years ago

holoviews seems to be doing some type of reprojection ... I believe that it will be possible to avoid the slow-down

Yes! The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it. That's especially important if you will be working interactively with the data many times in each session; much cheaper to project it once and then view it many times at different zoom levels and locations. If your data is in bare lat/lon values to start with, you can use holoviews.util.transform.lon_lat_to_easting_northing; otherwise try gv.operation.project.

I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Here I think you're asking about what happens if you host a dashboard or app that uses Datashader which is then visited by multiple users? By default, each such user will get their own copy of every attribute created by the app, including the data being plotted, which can quickly use up all the available memory. Instead you should use pn.state to cache the dataset so that all visitors share the same copy of the underlying data structures; see pn.state in https://examples.pyviz.org/exoplanets. Once you've done that, each new visitor will be cheap to support, storing just a few extra bits of information specific to their session and allowing the server to host many users simultaneously.

BTW, "PyViz" represents all viz libraries ever (see pyviz.org); here I think you're discussing HoloViz.org.

saeed-moghimi-noaa commented 3 years ago

@jbednar

Thanks a lot for taking the time to read, respond to this issue and providing very good pointers. Let us do our homework and get back to you. Best, @saeed-moghimi-noaa

yosoyjay commented 3 years ago

The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it.

@jbednar Thanks a lot for that explanation! It was clear from profiling that Thalassa was doing a lot of work on pan + zooms, but nothing I saw in the trace suggested it was reprojecting.

pmav99 commented 3 years ago

@jbednar thank you for the suggestions, they were really helpful

I did experiment with both of them and I think that we don't need to do any reprojections after all. Since our input data is in lon/lat we just need to avoid mixing holoviews and geoviews and just stick with geoviews instead. If we do that then performance is great. There is also a cartopy/proj bug that affects performance but we can circumvent it by downgrading to cartopy < 0.20.

Other than that, my preliminary tests show that pn.State seems to be working great, i.e. each worker loads the data just once. The only issue I can see is that bokeh/tornado does not necessarily distribute new requests to idle workers and, occasionally, this can make response slower. But this is probably not related to the holoviz universe

@ all the rest I (force) pushed an update in the v0.2 branch. This is how things are looking now when working locally. The second dataset has a million nodes and 2 millions faces. I still haven't found the time to deploy this on a server to see how it goes over the network. It would be great if you could test it out with bigger grids

https://user-images.githubusercontent.com/411196/137175616-f82ba1b3-7180-41d6-8edf-1ce946200329.mp4

cuill commented 3 years ago

Thank you @pmav99 for rewriting Thalassa v0.2.

I was able to run it on a virtual machine after making some changes: 1) ModuleNotFoundError: No module named 'thalassa' It was fixed after modifyingfrom thalassa.web_ui import Thalassa to from web_ui import Thalassa

2) ImportError: attempted relative import with no known parent package This was caused by from . import utils from . import visuals in web_ui.py So I changed it to from utils import can_be_opened_by_xarray, open_dataset `from visuals import get_trimesh, get_max_elevation, get_wireframe'

3) PermissionError: [Errno 13] Permission denied: '/tmp/thalassa.log' I changed path to the current directory in 'config.yml'

Here is what the end results look like for ICOGS3D: Screen Shot 2021-10-19 at 5 18 20 PM

saeed-moghimi-noaa commented 3 years ago

All,

Here is a related post:

https://discourse.holoviz.org/t/panel-webapp-with-dask-bokeh-tornado-and-asynchronous-tasks/2388

pmav99 commented 2 years ago

The code has been merged to master, so I guess this can be closed now