Comments for climatedata.ca

juliettelavoie commented 1 year ago

As discussed in the CCDP DWG, I am putting comments here regarding the integration of the tool on the site.

Really cool tool! I think it will provide a really vivid image on future climates for users!

Here are a few things I think might not be immediately obvious to our users:

[x] "Number of candidate pixels": I think most people associate pixels to photos. Maybe it could be replaced by grid cell or urban area . I also like Travis's idea to show the candidate area of the map. If possible, it would be nice to be plotting it even before running the search. This would work well with Mathieu's idea to have a result already showing when we first land on the page. The target city for the first result already showing could be random or based on the location of the user.
[x] Arriving to the page without having read the report it is a bit confusing why a red model with poor quality is before an average yellow model. Is the representativeness score so important that is should influence what people see first? If so, I would be clearer that this is what is being rank here and why that matters.
[x] In "Quality of analogy: Excellent (-0.031, top 0.99 %)", the top 0.99% of what ?
[x] In the row urban area and the column Analogue, what is the number of km in parentheses?
[x] On the figure distribution comparison, it is already clear, but I think it would be easier to read if there was a common element for the dataset (colour) and a common element for the time periods (filling). e.g., target future = striped purple, target present = solid purple, analogue present = solid yellow.
[ ] On the figure average change, I assume the dotted lines are standard deviations of the target present. I would be nice to add this to the legend. Also, I am not sure what message this is trying to communicate ? (see #35 - SG)
[ ] On the figure Full Timeseries, a legend would be helpful even if the same colours as before are used. Also, maybe we could add an arrow to signify that we are copying the target present time series in the future. (see #35 - SG)

We discussed an advanced toggle in the meeting. I am now thinking just keeping the univariate analogue sections closed/rolled up when opening the page and more information bubbles for the number of representativeness score and quality of analogue might be good enough.

tlvu commented 1 year ago

Which URL was used to access the tool? https://pavics.ouranos.ca/panel_serve/Analogues-Spatiaux-Dashboard (old from https://github.com/tlvu/test-panel-serve/tree/main/test-notebooks, just a proof of concept, unmaintained), or https://pavics.ouranos.ca/analogues_spatiaux/Dashboard (new from this repo)?

juliettelavoie commented 1 year ago

Oh, I used the old one! I followed the link from the ccdp confluence page (https://ccdpwiki.atlassian.net/wiki/spaces/CCDP/pages/2259320844/Spatial+analog+tool+dataset+documentation). @tlogan2000 should I edit your post to point to the new one?

tlvu commented 1 year ago

Pascal made some fixes to the new one but I also found a bug with the new one: https://github.com/Ouranosinc/analogues_spatiaux/issues/2

tlogan2000 commented 1 year ago

@juliettelavoie yes please.

juliettelavoie commented 1 year ago

done.

aulemahal commented 1 year ago

Nice to see this is appreciated!

Number of candidate pixels

Indeed, pixels is not the best term. Plotting on the map would work, but might be hard to see when considering only very dense cities.

As for plotting it live : that would mean plotting the newly selected cells on the map of the previous search? Or elsewhere? Or we clear the result pane before configuring a new search?

Arriving to the page without having read the report it is a bit confusing why a red model with poor quality is before an average yellow model. Is the representativeness score so important that is should influence what people see first? If so, I would be clearer that this is what is being rank here and why that matters.

This comment and some next ones all point out to the lack of documentation. I need to add text bubbles, titles and short sentences around the UI to add context.

As for the question of which ranking is most important, I'm not sure myself. My first idea was to have the "quality" as the ranking, but Emilia prefered having the representativeness score first. I guess it depends on what you want to know!

In "Quality of analogy: Excellent (-0.031, top 0.99 %)", the top 0.99% of what ?

Well, that's the complex part that I'm not sure how to explain in a short text. It's the top 0.99% of a distribution of scores for the same indices combination on randomly chosen grid cell couples. I interpret it as : "if you chose any two grid cells in North America at random, you would have a 0.99% chance of getting a better score than this one".

In the row urban area and the column Analogue, what is the number of km in parentheses?

The distance between the current target and this analogue.

On the figure distribution comparison, it is already clear, but I think it would be easier to read if there was a common element for the dataset (colour) and a common element for the time periods (filling). e.g., target future = striped purple, target present = solid purple, analogue present = solid yellow.

Good idea.

On the figure average change, I assume the dotted lines are standard deviations of the target present. I would be nice to add this to the legend. Also, I am not sure what message this is trying to communicate ?

Indeed, that's what it is. I'm not sure this message is 100% useful, but it's to help judge how much the climate change signal is significative, in comparison to the difference between the analog and the target's future.

On the figure Full Timeseries, a legend would be helpful even if the same colours as before are used. Also, maybe we could add an arrow to signify that we are copying the target present time series in the future.

:+1:

We discussed an advanced toggle in the meeting. I am now thinking just keeping the univariate analogue sections closed/rolled up when opening the page and more information bubbles for the number of representativeness score and quality of analogue might be good enough.

Ok! The "toggle" idea was to hide some configuration option? Or to have a check box that hides/shows the univariate sections? It's easy to keep the sections folded by default.

Pre-computation

Computing analogs is not fast, If we trigger a computation (random or otherwise) on load, I fear the user will find it too slow? When one clicks on "search", one expects it to be slow, but that's not true for the initial loading of the page! I guess this depends on potential improvements and on how it runs on the CRIM's side!

juliettelavoie commented 1 year ago

The idea of initially showing results when we open the page is because it makes it clearer what users can expect from the tool and help them choose the right options. If it is too long the run the search, we could also always show the same pre-run example.

How I understood the toggle was to hide some results to not overwhelm users that are less technical or less knowledgeable about climate with numbers/figures they don't understand.

matprov commented 1 year ago

Climatedata's staging environment is live here : https://app-spatial-analogs-staging.climatedata.ca/analogs/Dashboard

As discussed during the AWG, loading time is enormous for two reasons: Some calculations are done at run time instead of boot time. A suggestion would be to precompute data to avoid having to compute it every time. Data location; when the data will be located on CRIM's Thredds server, the data access time should be reduced.

As opposed to our dev environment on pavics.ouranos.ca, we don't have auto-deployments when PR gets merged, for now. Hence, please ping me when we need to deploy a new version of this tool.

Note that before publishing a change to the tool, running make build-local followed by make run-local allows us to make sure the local Docker image will run the same way as the deployed version on Climatedata.

aulemahal commented 1 year ago

Some calculations are done at run time instead of boot time.

I'm not sure I fully understand this. I thought that panel serve would execute the notebook up to the dash.servable() line only once. Do you know how I can move some code to the "boot time" ?

Also, the main problem with pre-generating the results is the size of the data and the computation time. Currently the "analog-finder" function computes a score for each indices combination / target city / ensemble member / emission scenario / candidate city / time horizon. Given all the possible values for these 6 "inputs", I estimate the number of scores to be 608'844'600'000. Storing them at float32 would use 2.2 TB.

We could pre-compute a subset of common requests and warn the user when the request is not pre-computed? The alternative would be to reduce the possibilities (fewer indicators, tighter density range + clustering of the candidates, fewer canadian cities...).

matprov commented 1 year ago

@aulemahal In PR https://github.com/Ouranosinc/analogues_spatiaux/pull/5 we can use pre-computed reference distributions and density map, which reduces a lot the loading time on Climatedata's host. Using the pre-computed objects allows us to load the app on https://app-spatial-analogs-staging.climatedata.ca/analogs/Dashboard in reasonable time.

I'm not really sure how we can see the notebook diff via the PR though.. anyway, I've added an excerpt of what changed in the PR description.

Also, the main problem with pre-generating the results is the size of the data and the computation time.

Here when I mentioned pre-computing I wasn't talking about "the" analog computation but more about the generic stuff that gets computed everytime (eg: reference distributions and density map). I think that pre-computed reference distributions and density map will be a good start.

SarahG-579462 commented 1 year ago

We received more comments from the DWG and Alex Cannon. I've summarized them below, along with the previous comments in this thread:

Feature request:

[ ] Show candidate areas on the map, prior to running the analogue search. JL SG
[ ] plot data for lookup city before performing analog search, based on geolocation of user JL, TL, MP
- which climate index? Which time period? SG
[x] progressive user interface (e.g. rolled up options that have default pre-selected) JL EB LPC
- Use case is wide, but accessibility is low. This would help accessibility. EB
[x] Precompute data to reduce user run time MP
- would cost 2.2 TB of storage. Alternatively, we could compute common requests and warn the user of compute time for longer requests PB.
- Don’t pre-compute analogues, just generic stuff such as density maps, reference distributions, etc. MP
[ ] Add other non-climatic criteria to the density distrimination AC.
- Ideas: min/max population, GDP, city budget per capita, GINI index (wealth inequality) SG
[ ] Frame the app by warming level, in addition to time period (with forcing scenarios) AC
[x] Move tool from PAVICS to CRIM for scalability and stability EB
[x] Guidance on how to use the tool built in, or at worst, a how-to document. EB
[x] Don’t show poor analogues, we think this is unnecessary for most users EB
[x] Add summary information about the target and analogue, e.g. averages and ranges of the indices chosen and temperature and precipitation variables. EB
[x] Swap to tab layout, overview tab and a tab per variable? SG
[x] See included mock-up by Ryan for a more ClimateData-esque page layout, with additional documentation and user guidance.

Documentation:

[x] Lots of work to be done here. AC EB etc.
[x] Replace “pixel” by “grid cell” or “urban area”. JL
[x] clarify the sorting/representativeness... What is the rank here and why it matters JL
[x] Clarify what is meant by “Quality of analogy: Excellent (-0.031, top 0.99%)” JL
[x] what is the km in the row analogue? (distance between cities, clarify) JL
[x] Add blurb on where density information was taken. SG
[x] Have a citation area where all data was taken, since we might add more than the current 3 sources. SG
[ ] Have variable description as an alt-text (hover to see full description). SG

Distribution comparison:

[x] have the target’s present distribution be of similar color to target’s future distribution, e.g. target present = striped purple, target future = solid purple. JL
[x] Plot the temperatures in Celcius instead of Kelvin SG
[x] label the y axis, could also label it as occurence per decade instead of probability. SG

Average change:

[ ] add standard deviation to the legend. JL
[x] or change standard deviation for quantiles, and add that to the legend SG

Full timeseries:

[ ] legend would be helpful. JL
[ ] Add an arrow to signify we are copying the analog present timeseries to the future. JL
[x] Timeseries is hard to read with all ensembles shown. Have percentiles instead, and maybe ensembles as a power option? SG
[x] Would it be possible to smooth the data like is done on Portraits Climatiques? See below. Can also allow user to turn off, if needed. SG

Ouranosinc / analogues_spatiaux