IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Example code that fails to run in R-Instat? #9013

Open rdstern opened 3 weeks ago

rdstern commented 3 weeks ago

@lloyddewit I was here to find an example of nanoplots.

Here is the code from there for an example:

library(gt)
library(dplyr)
library(tidyr)
md <- gt::md
illness |>
  dplyr::slice_head(n = 10) |>
  gt(rowname_col = "test") |>
  tab_header("Partial summary of daily tests performed on YF patient") |>
  tab_stubhead(label = md("**Test**")) |>
  cols_hide(columns = c(starts_with("norm"), units)) |>
  cols_nanoplot(
    columns = starts_with("day"),
    new_col_name = "nanoplots",
    new_col_label = md("*Progression*")
  ) |>
  cols_align(align = "center", columns = nanoplots) |>
  tab_footnote(
    footnote = "Measurements from Day 3 through to Day 8.",
    locations = cells_column_labels(columns = nanoplots)
  )

It now runs nicely and gives me one of 2 questions for @Patowhiz

image

The results are in the browser. I think (if we add a couple of lines - that Patrick has made easy) then they could be in the output window instead. I'm very happy to add those lines to the file?

The second example is different in RStudio and R-Instat:

image

In R-Instat here is the code:

# Create a gt table based on a preprocessed `sza`

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) |> 
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  tab_stubhead(label = md("month<br>(20&deg;N)")) |>
  tab_header(title = md("&#x2600; Solar Zenith Angles &#x2600;")) |>
  tab_options(
    column_labels.font.size = "smaller",
    table.font.size = "smaller",
    data_row.padding = px(3)
  )

In RStudio I first ran without the library lines. It complains that it doesn't know about pivot_wider, which R-Instat doesn't. So perhaps tidyr is automatically loaded in R-Instat, or it isn't running those lines the same way. In R-Instat it doesn't complain, but also doesn't give any results. It appears to run the code, but nothing happens.

@Patowhiz it runs without complaining, but there are no results. I wonder where they are. Could we add lines to put them into the output window?

@lloyddewit I then tried doing something wrong. I added x at the end of the pivot wider line above. I wanted to put the data so far into x, to see what it looked like! (Ok, so I am feeble in R!)

In RStudio it tells me this is an error. In R-Instat it crashes out! This seems consistent and I also tried with a previous version 0.7.18. Whenever there is an error in the code (or at least this sort of error), it crahes out mercilessly in R-Instat. In RStudio I remain simply able to correct the error. Is this a particular error, to do with piping, that it doesn't trap?

lloyddewit commented 3 weeks ago

@rdstern Thank you for reporting. The error is reported because the R-Instat R environment has its own md function. This function requires a y parameter:

image

The same problem was reported and discussed in issue #8942. Therefore I'll take the liberty of closing this issue. If you disagree, then please feel free to reopen. Thanks

rdstern commented 3 weeks ago

@lloyddewit and @Patowhiz I have edited the text above and re-opened it. Very sorry for not remembering the md problem, which is now corrected in the code above. But I also don't like the way I keep crashing out in R-Instat - when I add an error! And @Patowhiz, I still can't get the second example to display anywhere! I wonder where it is?

Patowhiz commented 3 weeks ago

@rdstern the results of the last script statement is a gt_tbl which R-Instat will internally try to view it as a text file using utils::capture.output() R function . R-Instat always does this by default when it doesn't know the output type of your last statement, it assumes the output is of type text. We currently have no implementation that can detect the output type like the way RStudio does.

@rdstern when exploring a solution that you could use to run the script, I found the some issues. My aim was to assign a gt object, so that I could use the data book custom functions for displaying outputs in output window.

Issue 1 - Last statement produces error unexpected symbol.

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
gt <- sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) |> 
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  tab_stubhead(label = md("month<br>(20&deg;N)")) |>
  tab_header(title = md("&#x2600; Solar Zenith Angles &#x2600;")) |>
  tab_options(
    column_labels.font.size = "smaller",
    table.font.size = "smaller",
    data_row.padding = px(3)
  )

Issue 2 - The class of the d variable is a "function" yet in RStudio it's a "gt_tbl" "list".

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
d <- sza |> filter(latitude == 20) |>  select(-latitude) |>  filter(!is.na(sza)) |> pivot_wider(names_from = "tst", values_from = sza) |>  gt(rowname_col = "month") |> sub_missing(missing_text = "") |> tab_stubhead(label = md("month<br>(20&deg;N)")) |> tab_header(title = md("&#x2600; Solar Zenith Angles &#x2600;")) |> tab_options( column_labels.font.size = "smaller",  table.font.size = "smaller", data_row.padding = px(3)  )
class(d)

I found the 2 issues to be very weird considering I traced the execution up to the point where the script gets send to R using the R.Net library. It had nothing to do with @lloyddewit R script library.

@rdstern could you confirm whether you will get the same issues in your machine?

rdstern commented 3 weeks ago

@Patowhiz I include @lloyddewit because there may be more than one issue here.

I copied your code (Issue 1) and, like you, I get an error. (And that's caused by your assigning to gt <- at the start. I find it odd that I don't get the error when I put all the commands on one line. Like you, it says the result is a function, which is curious.

I changed your addition slightly and don't now get an error!

# Create a gt table based on a preprocessed `sza`

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
gt <- sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) 
  gt
  gt |> gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  tab_stubhead(label = md("month<br>(20&deg;N)")) |>
  tab_header(title = md("&#x2600; Solar Zenith Angles &#x2600;")) |>
  tab_options(
    column_labels.font.size = "smaller",
    table.font.size = "smaller",
    data_row.padding = px(3)
  )
  gt
  class(gt)

I got rid of a pipe on line 9 and added gt in the middle. That's useful (for me at least), because the first part of the code is getting ready for gt, and then it produces the table. I now get the same class for gt in R-Instat as in RStudio.

So my 2 puzzles are why the result is different when there are lots of pipes on different lines and when all is on one line - or at least fewer lines? I wonder if that is a Stephen question?

And, running in RStudio I get the result on the right-hand-side - and the data at the bottom. We only have the one output window, of course. In RStudio that result comes automatically. Nothing comes in R-Instat.

Here is the result of my slightly changed version of your code in RStudio: (With the same results, plus the required table.)

image

I also have a second issue from the nfollowing code:

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) |> gt

I assume this is a @lloyddewit question? In RStudio it gives an error and tells me the rhs of piping must be a function. In R-Instat it simply throws me out! I don't like being thrown out like that!

Patowhiz commented 3 weeks ago

Thanks for attempting the modification. I have added inline comments to your modified code to explain the results that you have. Kindly go through the comments and confirm that they indeed explain what you found.

# Create a gt table based on a preprocessed `sza`

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md

# Does not produce a gt object. It produces a "tbl_df" "tbl" "data.frame" 
gt <- sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) 

# The 'gt' object is of class "tbl_df" "tbl" "data.frame" 
gt 

# The lines below do not assign anything to the 'gt' object. So its class is retained.
# The lines simply pipe the data frame into the gt package functions.
gt |> gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  tab_stubhead(label = md("month<br>(20&deg;N)")) |>
  tab_header(title = md("&#x2600; Solar Zenith Angles &#x2600;")) |>
  tab_options(
    column_labels.font.size = "smaller",
    table.font.size = "smaller",
    data_row.padding = px(3)
  )

# Because there was no assignment, same class "tbl_df" "tbl" "data.frame" are retained.
gt
class(gt)

Note my assignment in my original code should not be the cause of error. The assigned object is indeed a gt object.

Patowhiz commented 3 weeks ago

@rdstern, regarding your second issue with the script that causes R-Instat to crash:

library(gt)
library(tidyr)
library(dplyr)
md <- gt::md
sza |>
  filter(latitude == 20) |>
  select(-latitude) |>
  filter(!is.na(sza)) |>
  pivot_wider(names_from = "tst", values_from = sza) |> gt

The problem isn't related to @lloyddewit's R script library. The issue stems from R.Net. Despite our existing catch statement, the error isn't being handled properly, leading to a crash. This crash occurs on line 1307 of clsRLink (as of 11/06/2024) when clsEngine.Evaluate(strScript) is executed. I suspect it's a System.StackOverflowException at the R.Net level. Due to the recursive nature of this error, it can't be handled, and the application gets terminated by the Common Language Runtime (CLR).

I found an important note from this article that explains the R error:

On the RHS of |>, you need to include the function as a function call, which means appending () at the end of the function name, rather than just its name. For example, the square root function is called by writing sqrt(). If you try to run mtcars |> sqrt without the () at the end, you will get an error: Error: The pipe operator requires a function call as RHS.

I couldn't find a quick fix for such a severe bug, but there are potentially two ways to address the issue:

  1. Update R.Net and check if they have internally prevented this type of error from ever happening.
  2. Prevent such scripts from being executed in R-Instat (i.e., not pass them to R.Net).

@N-thony, could you check if option 1 is possible? I understand that currently, we can't update to the latest versions of our dependencies because of winforms.

The only other solution I found is to specify that CLR unloads the application domain when this exception occurs. This might be a long shot, and I’m unsure how to implement it effectively for this specific error.

@ChrisMarsh82 may have better solutions in mind.

rdstern commented 3 weeks ago

@Patowhiz many thanks for your investigation.
a) On the not getting the table, and on changing the location of the browser table, I am still not clear on the R-Instat solution. b) I am impressed with your detective work on the crashing out. I tried making some more (different) mistakes in the script and it gives me sensible message each time. I thought we might have to accept that tweaking a script with piping was generally dangerous. If I understand you correctly it seems much less than that. I happened to stumble over the one situation that does cause a problem. I don't like surprises, but I don't mind R-Instat being released with this sort of problem, which we can flag in the help. (It is a bit like the crash in the R graph window, when you used the wheel). Scripts with piping run fine. The number of people tweaking scripts will be small. The number tweaking scripts with piping will be smaller. If I undersyand correctly then either save everything before tweaking a script with piping. Or maybe try the tweak in RStudio first, because this is a weakness in R-Instat. I can live with that. c) But I think we can do better. Looking at the article of course the|> is a more modern version of %>%! And replacing it in the code solves my problem and is logical! So @N-thony I have another suggestion. I assume it may be easy to add a Find-replace to the editor. Maybe (in addition later?) we also have a special entry in the Insert to make that particular change. (That's the advantage of a dialog and makes it more obvious to include in the help!)

rdstern commented 2 weeks ago

@Patowhiz and @lloyddewit I am still keeping this open for now, as one point remains: a) The mdre-introduction (command from wrong package being run) was my error, as it was resolved - and less than 2 months ago. But I am reminded that writing up out tables stuff in the help, will introduce these scripts and can discuss this issue.
b) The crashing, was correctly diagnosed by Patrick, and has led to a possible work-around, namely replacing |> by the original %>%. This would be helped by the addition of find/replace into the script window. This is one of a number of enhancements to our script system that Stephen has suggested could be undertaken by @N-thony. I hope it might be very easy, otherwise it would come after the big release. c) What still remains, with Patrick, is the presentation of the resulting gt tablesin the output window or the maximised window. I am sure he will have more of that, as he completes the gt sub-dialog and overseas the other dialogs that use his gt work. So, not to be forgotten, but not urgent enough to distract from the main gt work.