Open rdstern opened 4 months ago
@rdstern note data_book$get_columns_from_data(data_name="liouville", col_names="factors")
is not assigned to anything yet it returns a data frame when executed. The returned data frame is "captured" and sent to the output window. To avoid the capture, assign to something.
In this case, R-Instat is correctly doing what it is supposed to do, capture the returned data frame and display it in the output window.
@Patowhiz many thanks for the explanation. My concern is that the same code runs so differently from the dialog and the script window. @lloyddewit maybe this ball is now in your court? The difference is shown quite neatly here:
No it isn't! I have to capture the screen instead of copying:
a) When run as a dialog there are effectively 5 statements - as measured by the check-boxes. b) But when run as a script there are essentially 9 statements.
Now usually running the extra statements makes no difference. In the above the line
attach(what=liouville)
is an example where it doesn't matter. But running
factors <- conf.design::factorize(x1 )
data_book$add_columns_to_data(data_name="liouville", col_name="factors", col_data=factors, before=FALSE)
data_book$get_columns_from_data(data_name="liouville", col_names="factors")
as 3 separate statements matters a lot! In this example it slows R-Instat to a crawl, and later in the script it prints lots to the output window.
@Patowhiz as well as maybe changing the way the script is run I would like to understand whether there is an option to simplify the R-Instat data-book statements?
As mentioned above, I can (alternatively and a bit painfully) run the same command as above from the Prepare > Column: Numerice > Transform
dialog:
This has sort of the same problem, but not so noticable, because there is just one R-Instat databook command, which doesn't give much output. I am not quite clear why 4 commands are needed by the calculator and only 1 by the Transform?
@Patowhiz would it do any harm if the change was for R-Instat to save the results of each of the extra commands? Then there would be no extra output from running the script, even though they were treated as separate statements in the scripts, and (seemingly) not in the dialog?
@rdstern dialogs send their entire scripts directly to the R. I think when you run all in the script window, you should get the same effect. When you run line by line, the script is output in the output window after the execution. That's why you are getting multiple script outputs after every execution.
In regards to changing the script output implementation. That would require broader team discussions. Note that the output window is meant to use the same logging system that the script log window uses. The 2 are meant to be tied together. So for every output there is an underlying script log.
In regards to dialogs producing different scripts, that depends on the dialog implementation. Where possible, it's important we enforce consistency during our PR reviews. Whenever I test as a user, I tend to check on the script produced. You could do the same as well.
@Patowhiz you said: "dialogs send their entire scripts directly to the R. I think when you run all in the script window, you should get the same effect. When you run line by line, the script is output in the output window after the execution. That's why you are getting multiple script outputs after every execution."
a) I was using Run, rather than Run All. But I mark the same block of code that corresponds to a dialog. So, I wish it was executing in the same way in the 2 cases.
b) Could you check on the code generated when doing a calculation? Here - after various iterations, I have a version that now works as fast from the script, and from the dialog.
# Dialog: New Data Frame
liouville <- data.frame(x1=as.numeric(rep(seq(1,100000), each=1, length.out=100000)))
data_book$import_data(data_tables=list(liouville=liouville))
rm(liouville)
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
factors <- conf.design::factorize(x1 )
a <- data_book$add_columns_to_data(data_name="liouville", col_name="factors", col_data=factors, before=FALSE)
b <- data_book$get_columns_from_data(data_name="liouville", col_names="factors")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
data_book$append_to_variables_metadata(data_name="liouville", col_names="factors", property="labels", new_val="")
rm(list=c("factors", "liouville"))
So adding the variable - then not used - to save the output, both helped with the output window and greatly with the speed. (3 minutes, down to 14 seconds.)
c) Then I got bolder and found I could comment out 2 of the lines with no change in the result. (Of course maybe there would be a change with other calculations? So this works fine for this example:
# Dialog: New Data Frame
liouville <- data.frame(x1=as.numeric(rep(seq(1,100000), each=1, length.out=100000)))
data_book$import_data(data_tables=list(liouville=liouville))
rm(liouville)
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
factors <- conf.design::factorize(x1 )
a <- data_book$add_columns_to_data(data_name="liouville", col_name="factors", col_data=factors, before=FALSE)
#b <- data_book$get_columns_from_data(data_name="liouville", col_names="factors")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
#data_book$append_to_variables_metadata(data_name="liouville", col_names="factors", property="labels", new_val="")
rm(list=c("factors", "liouville"))
So now I have edited the whole script in this way, so saving one of the results in a, then commenting out 2 statements, including the one that was causing the problem. It seems to work fine:
# Dialog: New Data Frame
liouville <- data.frame(x1=as.numeric(rep(seq(1,100000), each=1, length.out=100000)))
data_book$import_data(data_tables=list(liouville=liouville))
rm(liouville)
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
factors <- conf.design::factorize(x1 )
a <- data_book$add_columns_to_data(data_name="liouville", col_name="factors", col_data=factors, before=FALSE)
#data_book$get_columns_from_data(data_name="liouville", col_names="factors")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
#data_book$append_to_variables_metadata(data_name="liouville", col_names="factors", property="labels", new_val="")
rm(list=c("factors", "liouville"))
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
length <- sapply( factors ,length)
a <- data_book$add_columns_to_data(data_name="liouville", col_name="length", col_data=length, before=FALSE)
#data_book$get_columns_from_data(data_name="liouville", col_names="length")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
#data_book$append_to_variables_metadata(data_name="liouville", col_names="length", property="labels", new_val="")
rm(list=c("length", "liouville"))
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
even <- ifelse(length %% 2, - 1,1)
a <- data_book$add_columns_to_data(data_name="liouville", col_name="even", col_data=even, before=FALSE)
# data_book$get_columns_from_data(data_name="liouville", col_names="even")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
#data_book$append_to_variables_metadata(data_name="liouville", col_names="even", property="labels", new_val="")
rm(list=c("even", "liouville"))
# Dialog: Calculations
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
attach(what=liouville)
liouville <- cumsum(even)
a <- data_book$add_columns_to_data(data_name="liouville", col_name="liouville", col_data=liouville, before=FALSE)
#data_book$get_columns_from_data(data_name="liouville", col_names="liouville")
liouville <- data_book$get_data_frame(data_name="liouville", use_current_filter=FALSE)
detach(name=liouville, unload=TRUE)
#data_book$append_to_variables_metadata(data_name="liouville", col_names="liouville", property="labels", new_val="")
rm(liouville)
@rdstern if a dialog is producing unnecessary scripts, then that is a dialog issue. It means it is not well implemented and that should be fixed. You script example above shows that some script commands produced by the dialog are not used at all. Which means the dialog is not implemented well. The increase in performance is because the unnecessary scripts are no longer executed.
I'll look into running the scripts by block. I think it should behave as running from the dialog.
@Patowhiz I have changed the title, because I have found another (small?) problem with the commands generated by the calculator. I explain it, though we may decide not to solve it. I suggest it is likely to be solved automatically by the changes from the discussion above.
You can use it as a simple (number) calculator without even having a data set opened. Just enter 7 * 5 and use the Try button. But what about putting the result in the output window? Well it works, but then issues some extra commands that give an error. So, maybe when we think of the commands we give round the calculator, we could eliminate this error?
I am now wondering if this feature is worth taking further. Could we discuss here how R-Instat works with - and without a data book? So, I tried using the mean function, and also sort - without data. And they work - sort of. Now the Try button sometimes gives an error, (but perhaps not), and it gives the results in the output window, see here:
Now I have taken that a step further, by turning off the echoing of the commands and also copying the commands manually into the Comment window:
Encouraging, and needs you @Patowhiz to check we are not creating problems, but not to make the initial changes. I need further thoughts, but like what might be possible here!
@rdstern yes the dialog needs to be looked at. When more features are added to an existing dialog, there is always a regression risk. This is due to the complexity of the components interactions. It's always good to test previously existing features before we merge.
@Patowhiz I have now stumbled over a really serious bug with the commands we produce with the calculator - at least - and perhaps other dialogs too.
It is simple to show with the data I used - by chance - namely from the library with the package hydroGOF.
The "problem" is that one of the variables has the same name as the data frame! When using the calculator the first line(s) of the R commands make all variables in that data frame available. But the calculator command now uses the data frame, rather than the variable! in its calculation. So with simple calculations it applies them to each variable. Therefore it generates multiple columns. Wrong and very confusing.
Knowing the problem I can easily generate it with other data frames. For example with the survey data, just rename the yield to survey (the name of the sheet!) and that gives crazy results (or an error, if the calculation cannot be done on some variables.)
@rdstern when the same variable name is assigned to different items, the last assigned item overwrites the previous values of the variable. This is what is happening behind the scenes when using the calculator dialog.
To address this, we could implement a warning for when a column name is similar to the data sheet name. This warning could be triggered during the data import process.
@N-thony an initial change to the R code generated with the calculator, is that it can be used, when there is no data book open.
But it gives 2 errors for the 2 extra commands attach
and detach
, that are for the data frames. Could they only be produced when there is a data frame. So otherwise it just sends the actual R command - both for Ok and To Script?
I hope this is easy?
On the dialog the Save Result (soon to become Store Result,) should also be disabled.
@Patowhiz I think you may be able to explain the following quickly, as I assume the code is yours. Then could either you, or Antoine make the corrections?
@N-thony there are some problems running scripts, when the script is generated from the Prepare > Column Calculator. They work, but can be very slow. They also generate (extensive results in the output window that are not generated when using the calculator from the dialog.
Here is the script file:
liouville.zip
It is to be used in a practice and video. I give the first draft here, in case it is of interest to see the context But I give a shorter code below that can be used directly.
Are you a natural mathematician.docx
Here is the start of the script:
I notice the calculator produces 4 R-Instat statements after the calculation. Does it always need to produce so many? There is a prolem also with the
data_book$get_columns_from_data
statement. When run from a script it produces results in the output window? This sometimes slows R-Instat dramatically, and sometimes produces an output per row of data - and we have 100,000 rows!I wonder if this is maybe left over from testing the "system"?
I note also that I can get the same calculations done in the Transform dialog (by a small trick). This gives much simpler code, whyuxg seems to work fine?
I wonder why the calculator needs so much more???