IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Improving the exceedance graphs #5057

Open rdstern opened 6 years ago

rdstern commented 6 years ago

I needed to do these for PICSA in Tanzania. They are important and can look good in R (and hence R-Instat). The dialogue is in Describe > Specific > Cumulative Plots and also in Climatic > Seasonal Forecast Support > Cumulative/Exceedance Graphs. Once it is working a bit better, then in Climatic I suggest we move it (or possibly add it also to the Climatic > PICSA menu..

I have suggested Maxwell and Danny, not because they will fix the problems. But it is a good time to consider who could be looking at the plotting with this possibly as a first example. This will need supervision by Maxwell, and some edits, mentioned below, could be easy to do.

Here are 2 examples: exceedence plot - seasonal totals

exceedence plot - lengths

They are the opposite of what statisticians call cumulative frequency graphs. The Exceedance graphs go down from 1 to zero and the probability (left hand side) shows the chance of more than the given x value.

It is possible that the improvements needed are general, rather than just this dialogue.

a) When I change the y-axis options to Continuous I find it is initially set to discrete. (I assume this setting is somewhere in the code for this dialogue, rather than general? It is an interesting issue, because the y variable is sometimes an integer column (like length in complete days), but the y-scale in the graph is from 0 to 1, so is always continuous. b)The default on the y-variable is to give me labels at 0, 0.25, etc and the request (in PICSA is for more labels to help read off the corresponding values more easily. I change it to numeric and make the necessary edits. It ignores me and when I return it is reset to discrete again. (I had to edit the script file to get the 0.1 spacing in the figures above. The problem is clear from the code below. c) It is because the R code seems contorted and repetitive. I give an example below. It shows the settings of 0.1 were accepted, and then overwritten by later code. d) The Line options in the dialogue are disabled. That stops me being able to change the colours, thickness etc. e) Curiously the main options - please call it Plot Options (rather than just Options) were enabled. I say curiously, because I was using multiple Y variables (to get the 2 lines in the examples above) and on other plots, the sub-dialogues are disabled. I like it enabled! f) We still have the general problem that when we add the points - which we can do, from the main dialogue - and I like that - we can't change the options on the points layer, because they are from the main dialogue. This is a general problem that is reported elsewhere and has been an issue for a long time. g) What I am not sure if it is easy is to change the y scale. There is the option in the dialogue - which is good, and that is to have counts, rather than proportions. Is it easy to also 1) Have percentages, and 2) Be able to multiple by 10 rather than 100 - so it gives the risks in 10 years, rather than in 100!

Here is an example of the current code from the command:

Code generated by the dialog, Cumulative/Exceedance Graphs

merge2 <- data_book$get_data_frame(data_name="merge2", stack_data=TRUE, id.vars=NULL, measure.vars=c("sum_Dodoma","sum_Kondoa")) last_graph <- ggplot2::ggplot(data=merge2, mapping=ggplot2::aes(x=value, colour=variable)) + ggplot2::stat_ecdf() + theme_grey() + ggplot2::geom_point(stat="ecdf") + ggplot2::scale_y_continuous(breaks=seq(from=0, to=1, by=0.1)) + ggplot2::scale_y_continuous(breaks=seq(by=0.1, to=1, from=0)) + ggplot2::ylab(label="") + ggplot2::scale_y_reverse(breaks=seq(1,0,-0.25), labels = seq(0,1,0.25)) + ggplot2::xlab(label="Seasonal Total (mm)") + ggplot2::scale_x_continuous(breaks=seq(by=50, to=1200, from=100), limits=c(), expand=c(0.0, 0)) data_book$add_graph(data_name="merge2", graph=last_graph, graph_name="last_graph") data_book$get_graphs(graph_name="last_graph", data_name="merge2") rm(list=c("merge2", "last_graph"))

default exceedance graphs

dannyparsons commented 6 years ago

I think there are two simple changes that could be done now:

  1. Instead of a checkbox for exceedance have two radio buttons at the top for cumulative/exceedance. This would then be consistent with other dialogs like boxplot

  2. Since the y axis is always 0 to 1 I suggest we have the break point options on the main dialog. So the sequence from, to, by controls. I think this would also make the code easier since it is ggplot2::scale_y_continuous(breaks=seq(from=0, to=1, by=0.1)) for cumulative and ggplot2::scale_y_reverse(breaks=seq(1,0,-0.25), labels = seq(0,1,0.25)) for exceedance. This is already hard coded into the dialog for exceedance so now having it optional shouldn't be difficult.

These are both good tasks for someone for someone wanting to get into understanding ggplot2 and R code in general to support Maxwell on graphics.

@rdstern will follow up more generally on graphics I think.

dannyparsons commented 5 years ago

@maxwellfundi My two suggestions above are good tasks for someone to do with your support who wants to get into understanding ggplot2. Can you coordinate this?

maxwellfundi commented 5 years ago

@dannyparsons yeah we can do this together with @Ogik99

maxwellfundi commented 5 years ago

@rdstern I now have this design for the cumulative distribution. I have one question here, the from, to, by are for the cummulative plot and I was wondering if would you like also to be able to change the values of the sequence in the scale_y_reverse(breaks=seq(1,0,-0.25), labels = seq(0,1,0.25))

image

rdstern commented 5 years ago

Hi Maxwell, Great you are working on this. But I wasn't expecting an addition to the main dialogue. I am ok with the 0, 0.25, etc as the defaults. The problem, to me was with the sub-dialogue y-axis tab. With this dialogue it seems to be set initially to discrete - which is wrong (and we don't yet have any options there. It should be set to continuous instead.
Then when I use the dialogue it has a problem with that sub-dialogue of resetting back to discrete, etc.
So, I don't think we need any more controls for this aspect. Just need to fix the code it must be generating for the y-axis - which I guess is linked to it thinking it should be discrete.

maxwellfundi commented 5 years ago

@rdstern From this conversation, @dannyparsons suggested these changes as the ones we can actually do now. There would be more work to be done to fix these other bugs.

rdstern commented 5 years ago

I am still thin king about this same bug of the scale always reverting to discrete and 0.25. In checking the approach you have taken, it would be good to try some examples. One is from Instat data sets > Climatic > Climatic Guide Datasets > samrain. Then almost any columns can be used, e.g. strt1 for a single column or strt1 and end for multiple, etc.

You will find that the limits of the y-axis are currently fixed at 0 to 1 for cumulative and 1 to 0 for exceedance graphs (probabilities go from 0 to 1!). So it is a bit confusing to be able to change the end points on the main dialogue. I still think this bug is one you could fix, but this is not the best way. If you really want this feature on the main dialogue, then just keep the step length as a control with 0.25 as the default (which it is now) and changes of 0.05 allowed, down to a lower limit of 0.05. This could be good practice, but I am not sure it will be a final solution.

maxwellfundi commented 5 years ago

@dannyparsons I am not certain that I will have this done by tomorrow....

dannyparsons commented 5 years ago

Ok thanks for informing. I think concentrate on your other tasks for now and update again by the end of today. Then we can decide what to do.

maxwellfundi commented 5 years ago

Moved this to the next milestone

rdstern commented 5 years ago

I have now got to this point in the R-Instat climatic guide. It would be good if the improved version could now be included. Is there still work to do on the new version?

It could also be added to Climatic > PICSA as well, just below Rainfall Graph

And it is currently called Cumulative/Exceedance Graphs. Please delete the "s" in the new position and also in the original. It is there at the bottom of Climatic > Seasonal Forecast Support.

And within the dialogue delete the s from the title. Also change the checkbox from Exceedance Plots to get rid of the s there.

rdstern commented 5 years ago

I have a new request for the exceedance/cumulative graphs - possibly using a script file. Here is an example: image

Here is an image with 3 graphs. The x-axis is days in the year - from January. I'd like the code (at least) to display this as dates within the year - as we can now do for the PICSA graphs. I realise that (initially?) this might be through getting and then editing the script.

rdstern commented 5 years ago

This may have been fixed, with the work being done. But with the "old" version, when going into this dialogue the Plot Options are enabled. Clicking on this gives: image

If you are not careful, this throws you out of R-Instat.

rdstern commented 5 years ago

Another point - this time related to the specific ecdf options. The default in the R-code for ggplot2::stats_ecdf(pad = TRUE). Please change this to pad = FALSE as the current default. This should be trivial to fix. Later - when the ecdf options are implemented, we can have both options. You can see the current problem in the figure below. The lines are extended to each end of the graph. (That's pad = TRUE)

image

maxwellfundi commented 5 years ago

@Ogik99 Are you still working on this?

Ogik99 commented 5 years ago

I have a new request for the exceedance/cumulative graphs - possibly using a script file. Here is an example: image

Here is an image with 3 graphs. The x-axis is days in the year - from January. I'd like the code (at least) to display this as dates within the year - as we can now do for the PICSA graphs. I realise that (initially?) this might be through getting and then editing the script.

We are thinking on possible ways to implement this. Is it okay if we have a checkbox on the main dialogue with the option of changing the x-axis to a date format.

rdstern commented 5 years ago

It would be great if you could get this feature to work. But there were quite a few things to fix already for this dialogue. For example, my version still crashes when I use the options button. Would it be possible for the new version of this dialogue to be merged first. Then I suggest there may be more options on the date formatting than we would want on the main dialogue. So I would be more inclined to consider the options as a tab on the special sub-dialogue. A bit like the one in the PICSA rainfall graphs. @dannyparsons will be able to comment better.

dannyparsons commented 5 years ago

Being able to display the doy column as a date needs to be implemented in every graphics dialog so it will need to be more general than one dialog. One suggestion could be on the x-axis, y-axis tabs on the Plot Options dialog having a "day of year" option alongside "Continuous", "Discrete" etc. So I think changes on this dialog can be made independently of the eventual changes for dates.

maxwellfundi commented 5 years ago

@dannyparsons we have been discussing the @rdstern issue of having dates on the x axis of the cumulative distribution graph.

following your comment, we need to edit the ucrAxis so that we have the day if year option. In doing this then in our mind, we out to edit the aes function so that we have the variable that needs to be plotted as dates converted to dates then add the scale x/y date function.

Do you know @dannyparsons if here is another way we can achieve this?

dannyparsons commented 5 years ago

That is the way to achieve this. As we do for the PICSA graphs, the variables needs to be transformed to a date, then the display managed through the scale_x/y_date function. So yes this is definitely a graphics system issue. And its not obvious how we easily implement this. If you have come up with a suggestion that would be great. We would want to be able to do this for any graph. This discussion should be a separate issue as its more general than the exceedance graphs. As Roger said, don't let this issue distract from the other improvements to this dialog.

rdstern commented 5 years ago

Just to say that I really like Danny's idea of adding a 4th "Day of Year" option to the X-axis and Y-axis tabs, so there becomes Continuous, Discrete, Date, Day of Year there. If so, then any special dialogue - like the cumulative/exceedance graphs that gives access to the Plot Options sub-dialogue could use this. In that case I would be happy if that is all there is.

Then, once we are using the dialogue(s) more we can see what could also be added to the main dialogue.

Sorry, this doesn't get to any detail relevant to Maxwell's question.

dannyparsons commented 5 years ago

I've moved the discussion on formatting a day of year column to here https://github.com/africanmathsinitiative/R-Instat/issues/5356 to separate it out from discussion on the exceedence graphs.

rdstern commented 5 years ago

Is this version being added to the next release?

rdstern commented 5 years ago

Work on PICSA in Mozambique is continuing by WFP. they have just written: "I’ve been working on the user products of the seasonal forecasts. We‘re aiming at producing most of these in terms of probability of exceedance. A sample of the most obvious / classic products is attached – prob of exceeding average rainfall, which is available for any period of choice, 1 to 5 months on a rolling 10 day basis. I’m working now on producing Prob (dry spell > 10 days) on any given 30 day period, next would be:

Prob(30 day rainfall > maize water requirement) Prob(start of agricultural season)

We’re about to have a meeting with Min Agro to see if they have ideas for additional products. "

This reminds me that the work on improving the R-Instat exceedance graphs has gone quiet again. Any chance is could be awoken? Then I can reply more easily to the WFP guys.