Ouranosinc / analogues_spatiaux

https://pavics.ouranos.ca/analogues_spatiaux/Dashboard
Apache License 2.0
0 stars 0 forks source link

Comments for climatedata.ca #4

Open juliettelavoie opened 1 year ago

juliettelavoie commented 1 year ago

As discussed in the CCDP DWG, I am putting comments here regarding the integration of the tool on the site.

Really cool tool! I think it will provide a really vivid image on future climates for users!

Here are a few things I think might not be immediately obvious to our users:

We discussed an advanced toggle in the meeting. I am now thinking just keeping the univariate analogue sections closed/rolled up when opening the page and more information bubbles for the number of representativeness score and quality of analogue might be good enough.

tlvu commented 1 year ago

Which URL was used to access the tool? https://pavics.ouranos.ca/panel_serve/Analogues-Spatiaux-Dashboard (old from https://github.com/tlvu/test-panel-serve/tree/main/test-notebooks, just a proof of concept, unmaintained), or https://pavics.ouranos.ca/analogues_spatiaux/Dashboard (new from this repo)?

juliettelavoie commented 1 year ago

Oh, I used the old one! I followed the link from the ccdp confluence page (https://ccdpwiki.atlassian.net/wiki/spaces/CCDP/pages/2259320844/Spatial+analog+tool+dataset+documentation). @tlogan2000 should I edit your post to point to the new one?

tlvu commented 1 year ago

Pascal made some fixes to the new one but I also found a bug with the new one: https://github.com/Ouranosinc/analogues_spatiaux/issues/2

tlogan2000 commented 1 year ago

@juliettelavoie yes please.

juliettelavoie commented 1 year ago

done.

aulemahal commented 1 year ago

Nice to see this is appreciated!

Number of candidate pixels

Indeed, pixels is not the best term. Plotting on the map would work, but might be hard to see when considering only very dense cities.

As for plotting it live : that would mean plotting the newly selected cells on the map of the previous search? Or elsewhere? Or we clear the result pane before configuring a new search?

Arriving to the page without having read the report it is a bit confusing why a red model with poor quality is before an average yellow model. Is the representativeness score so important that is should influence what people see first? If so, I would be clearer that this is what is being rank here and why that matters.

This comment and some next ones all point out to the lack of documentation. I need to add text bubbles, titles and short sentences around the UI to add context.

As for the question of which ranking is most important, I'm not sure myself. My first idea was to have the "quality" as the ranking, but Emilia prefered having the representativeness score first. I guess it depends on what you want to know!

In "Quality of analogy: Excellent (-0.031, top 0.99 %)", the top 0.99% of what ?

Well, that's the complex part that I'm not sure how to explain in a short text. It's the top 0.99% of a distribution of scores for the same indices combination on randomly chosen grid cell couples. I interpret it as : "if you chose any two grid cells in North America at random, you would have a 0.99% chance of getting a better score than this one".

In the row urban area and the column Analogue, what is the number of km in parentheses?

The distance between the current target and this analogue.

On the figure distribution comparison, it is already clear, but I think it would be easier to read if there was a common element for the dataset (colour) and a common element for the time periods (filling). e.g., target future = striped purple, target present = solid purple, analogue present = solid yellow.

Good idea.

On the figure average change, I assume the dotted lines are standard deviations of the target present. I would be nice to add this to the legend. Also, I am not sure what message this is trying to communicate ?

Indeed, that's what it is. I'm not sure this message is 100% useful, but it's to help judge how much the climate change signal is significative, in comparison to the difference between the analog and the target's future.

On the figure Full Timeseries, a legend would be helpful even if the same colours as before are used. Also, maybe we could add an arrow to signify that we are copying the target present time series in the future.

:+1:

We discussed an advanced toggle in the meeting. I am now thinking just keeping the univariate analogue sections closed/rolled up when opening the page and more information bubbles for the number of representativeness score and quality of analogue might be good enough.

Ok! The "toggle" idea was to hide some configuration option? Or to have a check box that hides/shows the univariate sections? It's easy to keep the sections folded by default.

Pre-computation

Computing analogs is not fast, If we trigger a computation (random or otherwise) on load, I fear the user will find it too slow? When one clicks on "search", one expects it to be slow, but that's not true for the initial loading of the page! I guess this depends on potential improvements and on how it runs on the CRIM's side!

juliettelavoie commented 1 year ago

The idea of initially showing results when we open the page is because it makes it clearer what users can expect from the tool and help them choose the right options. If it is too long the run the search, we could also always show the same pre-run example.

How I understood the toggle was to hide some results to not overwhelm users that are less technical or less knowledgeable about climate with numbers/figures they don't understand.

matprov commented 1 year ago

Climatedata's staging environment is live here : https://app-spatial-analogs-staging.climatedata.ca/analogs/Dashboard

As discussed during the AWG, loading time is enormous for two reasons: Some calculations are done at run time instead of boot time. A suggestion would be to precompute data to avoid having to compute it every time. Data location; when the data will be located on CRIM's Thredds server, the data access time should be reduced.

As opposed to our dev environment on pavics.ouranos.ca, we don't have auto-deployments when PR gets merged, for now. Hence, please ping me when we need to deploy a new version of this tool.

Note that before publishing a change to the tool, running make build-local followed by make run-local allows us to make sure the local Docker image will run the same way as the deployed version on Climatedata.

aulemahal commented 1 year ago

Some calculations are done at run time instead of boot time.

I'm not sure I fully understand this. I thought that panel serve would execute the notebook up to the dash.servable() line only once. Do you know how I can move some code to the "boot time" ?

Also, the main problem with pre-generating the results is the size of the data and the computation time. Currently the "analog-finder" function computes a score for each indices combination / target city / ensemble member / emission scenario / candidate city / time horizon. Given all the possible values for these 6 "inputs", I estimate the number of scores to be 608'844'600'000. Storing them at float32 would use 2.2 TB.

We could pre-compute a subset of common requests and warn the user when the request is not pre-computed? The alternative would be to reduce the possibilities (fewer indicators, tighter density range + clustering of the candidates, fewer canadian cities...).

matprov commented 1 year ago

@aulemahal In PR https://github.com/Ouranosinc/analogues_spatiaux/pull/5 we can use pre-computed reference distributions and density map, which reduces a lot the loading time on Climatedata's host. Using the pre-computed objects allows us to load the app on https://app-spatial-analogs-staging.climatedata.ca/analogs/Dashboard in reasonable time.

I'm not really sure how we can see the notebook diff via the PR though.. anyway, I've added an excerpt of what changed in the PR description.

Also, the main problem with pre-generating the results is the size of the data and the computation time.

Here when I mentioned pre-computing I wasn't talking about "the" analog computation but more about the generic stuff that gets computed everytime (eg: reference distributions and density map). I think that pre-computed reference distributions and density map will be a good start.

SarahG-579462 commented 1 year ago

We received more comments from the DWG and Alex Cannon. I've summarized them below, along with the previous comments in this thread:

Feature request:

Documentation:

Distribution comparison:

Average change:

Full timeseries: