Open rdstern opened 7 years ago
One anomaly in the filters is that I can't choose missing values in a factor. Or at least I don't know how to. It gives me the options for all the levels, but (of course) missing is not a level. The obvious way would be to include NA as another level. This would be a good option, because at some stage we will have multiple missing value codes and this will extend naturally.
I am copying Lily on this, because it is a bit of an R puzzle - at least for me! It is also sort of re-inventing our excellent calculation system.
I have the Dodoma data and would like to filter to get the "first" and "last". I can't (yet?) type a condition into the filter dialogue - I don't know whether we should in the future, so I tried using our calculator.
My first challenge was the first and last day of each year.
I found (from stack overflow of course) that typing in NewYear <- !duplicated(Year) gives TRUE for the 1st January each year and FALSE otherwise.
EndYear <- rev(!duplicated(rev(Year))) gives TRUE on the 31 December.
Now I can filter for either and if the Filter dialogue included OR, then I could use it directly.
But I can use my R-Instat calculator and do NewEnd <- NewYear | EndYear and then filter on this column.
So I can filter for the First and Last, even in a round-about way. I wonder whether we could make the calculator and/or the filter dialogue so this is easier?
Now a bigger challenge and one which we make simpler with the climatic commands. How could I use my calculator and filter to get the first occasion each Year with Rain more than 20mm and in November or December? My way is tortuous, because I don't know how to generalise the commands above. The point of this message is to find whether there are easier ways (I know I can use the start of the rains) by better use of R or by improving the calculator or filter system in R-Instat.
So, what I can do in the current calculator is Startrains <- (as.numeric(month_abbr)>10)&(Rain>20). This gives me ALL occurrences of the event I want.
Now I can filter on the Year AND Startrains and then use use the method above on the filtered data to give me the first occurrence.
I feel there should be an easier way to get the first occurrence of a more complicated condition than the first of the year.
And then can we make this sort of process easier. I realise I am probably half re-inventing David and Danny's calculation system.
I now found a small bug in the filter dialogue. I had a factor with some missing values in the column. I wanted to filter to just those rows when the column was NA. This wasn't explicitly dealt with, but I tried by selecting none of the factor levels. (there may in the future be a better way to select those rows, but this also may be selected. It then gives the condition as Month %in% without having anything that is in! I am not surprised it doesn't like this!
Ok that's just a bug you shouldn't be able to add when none are selected.
NA now appears as a "level" in the filter dialog.
Great. Many thanks.
Two more issues with the filters. When I choose a data frame on the main dialogue it is sometimes set differently on the sub-dialogue. I would like it if the selector either didn't include the data frame, or had it as fixed to the same as on the main dialogue. I choose to filter on the year in a climatic file. The year was a variate. I started on the sub-dialogue put the year into the receiver for the condition. That was when I realised that the type of column was wrong. So I abandoned and change the year to be a factor. When I returned to the sub-dialogue I found the year still there, with the condition as before as though it was still numeric. I tried adding the new year. I also tried reset. neither worked. What did work was to delete year from the condition and then re-instate it. I suspect this issue may be fairly general, namely how could a receiver recognise when a column has been changed. To illustrate I took Dodoma and duplicated the Year column, called Yr. Then set up to make a filter, i.e. Yr == 1988. I then back-tracked, and deleted the Yr column. Then I returned to the dialogue, found the condition was still there. I accepted the condition (with the deleted column) and managed to get the following error:
Error running R command(s)
Error in self$get_data_objects(data_name)$add_filter(filter, filter_name, : object 'value' not found
The error occurred in attempting to run the following R command(s):
InstatDataObject$add_filter(filter_name="Filter1", data_name="Dodoma", filter=list(C0=list(value==1988, operation="==", column="Yr")))
OK
There is another potential issue with the filter dialogue. It can be accessed from within each other dialogue. Then I suggest (by default) the data frame on the main (filter) dialogue should be the same as the current one on the dialogue that is calling it.
Now set it so that the sub dialog always opens on the same data frame as shown on the main dialog that opened it. And the data frame is now always disabled on the sub dialog. Not sure if that could be frustrating in some cases, but it's definitely less confusing.
Another suggestion (a dangerous one!) plus an oddity (I think) that I like.
Could there be somewhere in the sub-dialogue, perhaps the preview, where we could edit - like the calculator.
The suggestion, is general, but my example - which surprised me later - was something I thought should not have worked. I see it does now.
I had a data frame with a Year column - which was character. I noticed that you had allowed logical operators, so I had Year > '1953' as my condition. I was rather surprised that it worked. I suppose it is alphabetical on real characters and numeric order on characters that are numeric. Pretty clever!
And more. Would it be reasonably easy to have a keyboard when there is a numeric condition? It could have the usual number pad and then TRUE and FALSE and NA. Anything else?
Then I know eventually we want to be able to edit a filter. Would it (also?) be easy to include a filter in another one. So one filter might be for a particular station. Then it would be easy to have a second one which is for that station and also for a particular set of years, etc.
And just to support the 0.4.7 hope for some enhancements to the system. I am using it a lot just now.
I have started to use the Climatic > Prepare > Display Daily dialogue with the Dodoma data. As Danny says this needs to be done on a subset of the data. So I need to use filters currently. There are no controls on the dialogue to choose a subset of the years.
My views are hardening. I would like to keep it that way. We improve the filtering system rather than complicating the dialogue. It is great that it works from the dialogue itself.
For now, one small improvement would help, namely when a new filter is started, it does not remember the old setting. It would help if it did. This might link with being able to edit an existing filter, as here I always want the same Year column as the base for my filter.
When I edit a filter my default here would be to give the new filter a new name.
I really like the fact I can keep a set of filters for a data frame, say for 12-year blocks. I give each one a sensible name. Then I can go back to them easily. This wouldn't take much longer than changing the controls in a dialogue. And it has many advantages.
I would have to spend a bit of time setting up the filters for the first time. But (in the longer term) we might even be able to copy filters from another data frame - if they make sense. And these could easily make sense, because we would know which was the year column in each case, because they are climatic data!