fslaborg / RProvider

Access R packages from F#
http://fslab.org/RProvider/
Other
235 stars 69 forks source link

CI with github actions #216

Closed nhirschey closed 2 years ago

nhirschey commented 2 years ago

I'm really looking forward to using this library.

image

nhirschey commented 2 years ago

I figured out how to add back ubuntu and windows and I did fail-fast:false so that you could see macOS is completing while ubuntu and windows do not.

The test failures on ubuntu are the same that I see on my windows machine when I try to run the tests.

Example fun: https://github.com/nhirschey/RProvider/runs/3616975071?check_suite_focus=true

dsyme commented 2 years ago

Great to see this going through on Mac!

dsyme commented 2 years ago

@nhirschey Would you like to be co-maintainer with @AndrewIOM ?

nhirschey commented 2 years ago

@dsyme, if @AndrewIOM would like the help I am happy to co-maintain with him. I am very appreciative of his hard work getting this library up to date. I use both F# and R most days for my research, so I benefit a lot from getting this library working again and stable.

AndrewIOM commented 2 years ago

Thanks @nhirschey , would be great to co-maintain with you. It will be very beneficial to have additional support from someone experienced on both the R and F# sides!

AndrewIOM commented 2 years ago

Thanks for your fixes! We have left one outstanding issue: building the test solution on linux and windows seems to lock up.

I've just tested out building, testing, and the full 'fake run -t All' in an Ubuntu dev container with .net 5.0 and R 3.6.3 (installed at /usr/lib/R) on Ubuntu 20.04.3 LTS. All runs fine, all tests pass, makes nugets and docs fine. Need to investigate further if this is an R version issue or Github Actions issue.

nhirschey commented 2 years ago

I am on windows, and the CI behavior is the same as when I run locally--I cannot use the provider or run the tests.

When I try to load RProvider.dll it locks up fsharp interactive, so it seems to be the issue. Here's a screen-shot of the intellisense error ... I tried a bit of digging but haven't been able to track down the explanation for these error messages.

image

It actually looks like this issue: https://github.com/dotnet/fsharp/issues/3303

dsyme commented 2 years ago

Can DesignTime and Runtme be netstandard2.0? That would be simplest if it works

https://github.com/fslaborg/RProvider/blob/master/src/RProvider.DesignTime/RProvider.DesignTime.fsproj#L5

dsyme commented 2 years ago

On Windows I installed R locally, built and got a hang when building the Tests. Attached a debugger and it's stuck at this point. Note that so far I've not explicitly installed any R packages on my machine

image

dsyme commented 2 years ago

@nhirschey https://github.com/fslaborg/RProvider/pull/222 might help with the issues in the IDE

dsyme commented 2 years ago

I merged #222

@nhirschey I enabled logging and got this log https://gist.github.com/dsyme/5b31f597a035fe66b371e0ca9a30ecd9

Looks like R is closing the pipe and hence the hang

Looks like you might first need to do better pipe-closure detection, recovery and error reporting so this is visible and not a hang. Making the log much more visible e.g. in error reports sounds good.

[17/09/2021 15:38:50] [Pid:13664, Tid:1, Apid:1] Communicating with R to get packages
[17/09/2021 15:38:50] [Pid:13664, Tid:1, Apid:1] eval(1+4)
[17/09/2021 15:38:50] [Pid:13664, Tid:1, Apid:1] engine: Creating and initializing instance (sizeof<IntPtr>=8)
[17/09/2021 15:38:52] [Pid:23356, Tid:4, Apid:1] [Client Pipe log]: Pipe has closed.
nhirschey commented 2 years ago

Thank you for the tips @dsyme. I enabled logging and get the same log output as you. Tracing the code, I think this is an upstream problem caused by an interaction between newer versions of R and a Windows security feature "Control Flow Guard" as mentioned in this RDotNet issue https://github.com/rdotnet/rdotnet/issues/139#issuecomment-903048836. That would also explain why the failure is only happening on windows.

It looks like we'll have to wait for that fix to resolve this windows build problem. But I'll also keep poking at it to see if we can get it to build either on on R <= 4.0.2 or with the workaround noted in that RDotNet issue.

More details: The s.GetPackages() code seems to be failing after the first call to eval("1+4") because the log doesn't print the 5 after eval("1+4").Value:

    let getPackages() : string[] =
        Logging.logf "Communicating with R to get packages"
        Logging.logf "Test: %O" (eval("1+4"))
        Logging.logf "Test: %O" (eval("1+4").Value)
        eval(".packages(all.available=T)").GetValue()

The above referenced RDotNet issue says calling Evaluate on x64 for R >= 4.0.3 causes crashes. I installed R 4.0.2 and set that as R_HOME to make RProvider use it and the logs a) confirm use of 4.02 and b) get farther. Here the System.Double output should be the results of eval("1+4").

[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] eval(1+4)
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] Output: 
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] Converting value from R...
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] [DEBUG] MEF Container
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] [DEBUG] MEF Container 2
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] [DEBUG] MEF Container 3
[9/17/2021 11:09:02 AM] [Pid:1484, Tid:1, Apid:1] Test: System.Double[]

The build tests still hang with the log stopping after trying to do something with Test.RProvider\data/sample.rdata. But the fact that we get farther with 4.0.2 than 4.0.3 is consistent with the problem being caused by the R/RDotNet/Control Guard issue.

nhirschey commented 2 years ago

@AndrewIOM,

  1. I tried with R 3.6.2 and the result for me building locally is the same as what I describe below with R-4.0.2.

  2. I used CI to generate gists of the logs for mac and windows. Do these differences hint at something that could cause the windows tests to fail?

One thing from your big pull request caught my eye:

Remove Remoting between server and client processes, and replace with an implementation based on System.IO.Pipes.

I think RProvider.Server.exe is still getting launched on windows (see "[Server Pipe log]" messages). Is this expected in the new pipe architecture?

Note: windows was local with R-4.0.2, mac was CI R-4.1.1.

As an example of differences, compare windows starting line 15:

[9/18/2021 9:22:31 PM] [Pid:12656, Tid:4, Apid:1] [Client Pipe log]: Connecting to named pipe
[9/18/2021 9:22:31 PM] [Pid:30372, Tid:1, Apid:1] Server started, running event loop
[9/18/2021 9:22:31 PM] [Pid:30372, Tid:1, Apid:1] server event loop: starting
[9/18/2021 9:22:31 PM] [Pid:30372, Tid:6, Apid:1] [Server Pipe log]: Connected to client.

To mac line 12

[9/18/2021 11:26:04 PM] [Pid:7701, Tid:6, Apid:1] [Client Pipe log]: Connecting to named pipe
[9/18/2021 11:26:04 PM] [Pid:7701, Tid:4, Apid:1] [Client Pipe log]: Connected.
[9/18/2021 11:26:04 PM] [Pid:7701, Tid:6, Apid:1] Got some server
[9/18/2021 11:26:04 PM] [Pid:7701, Tid:6, Apid:1] Sending command: get init error message...
AndrewIOM commented 2 years ago

Thanks @nhirschey. I'm back from travelling now so have access to a Windows 10 VM and can finally start looking into the windows issues in detail today.

Just got started and can reproduce the lock-up locally on windows:

[19/09/2021 11:11:13] [Pid:7032, Tid:1, Apid:1] Adding member pi [19/09/2021 11:11:13] [Pid:7032, Tid:1, Apid:1] Adding member volcano [19/09/2021 11:11:13] [Pid:7032, Tid:1, Apid:1] Adding member volcanoList [19/09/2021 11:11:13] [Pid:7032, Tid:1, Apid:1] Adding member volcanoMean [19/09/2021 11:11:13] [Pid:7032, Tid:1, Apid:1] Finished generating types for C:\Users\acm\Documents\GitHub\RProvider\tests\Test.RProvider\data/sample.rdata [19/09/2021 11:11:15] [Pid:6036, Tid:9, Apid:1] [Server Pipe log]: Pipe has closed.

Will investigate this (look at error handling / pipe closure etc.), and also the logging issue.

I think the client and server are both running, but that a recent regression has stopped logs from spawned server processes being added to the text file on macOS for some reason (could be related to passing of environment variables?). These logs were there a few commits ago, and now on my local mac copy of this PR I only see logs from the client. Will investigate!

Edit: so it looks like the issue only occurs on windows during fake test. I have been able to run all tests using dotnet test, and also use R from dotnet fsi. In image, left is Ubuntu with fake passing; right is windows with dotnet test and dotnet fsi working (R v4.0.2, dotnet latest v5 SDK, Windows 10 1704). It may be that it builds and runs fine when the build is triggered for a second time. Screenshot 2021-09-19 at 11 55 39 .

AndrewIOM commented 2 years ago

I think I have identified the problem - the Stop command is not being sent on windows, as it never detects that the process with the given PID has ended (this is when the process is no longer present in task manager). The problematic block of code is:

/// Process.WaitForExit does not seem to be working reliably
/// on Mono, so instead we loop asynchronously until the process is gone
let rec asyncWaitForExit pid = async {
  let parentProcess = try Process.GetProcessById(pid) |> Some with _ -> None
  match parentProcess with
  | Some _ ->
    do! Async.Sleep(1000)
    return! asyncWaitForExit pid
  | None -> () }

Will work on a fix. Edit: Fix applied in below commit stops windows from locking up locally, and allows dotnet test to run correctly. However, this has not fixed the problems with CI

AndrewIOM commented 2 years ago

Windows and macOS are now building and testing OK!

Only linux to go - this is proving tricky, as it works locally for me. Going to try R 4.0.2 locally to see if that makes a difference.

dsyme commented 2 years ago

Great progress!

nhirschey commented 2 years ago

Fantastic! I was able to build and pass all tests on windows 10 using dotnet fake build -t All. Additionally, today is the first time that I have seen intellisense in Visual Studio 2019 working without errors on the R types/namespaces (I can browse packages, installed datasets, etc. using R. or open RProvider.).

@AndrewIOM maybe save this for a separate issue, but when I plot something the plot window is locked up by the REngine. I cannot resize the window and if I try to close the plot window it crashes the F# process (see gif).

rprovider-plot

AndrewIOM commented 2 years ago

I have been debugging the reasons for the Ubuntu fail on CI but not on a local machine. I have identified that it is actually REngine.GetInstance() in RInit.fs that is hanging in the Ubuntu CI, so this could be an RDotNet issue. I tried moving the engine initialisation out to a subsequent call to engine.Initialize() - this then is the step that locks up. No exceptions seem to be thrown. Not 100% sure how to proceed with this, as it works for me on the same Ubuntu version and same R version vscode dev-env. Downgrading the CI to R 3.6.3 doesn't help.

Really exciting to see it all working on Windows. Thanks for testing! For the years I have used RProvider (on mac), the graphics have beachballed and been locked up when using quartz, but not when using X11. I will open an issue - definately worth a fix for both windows and mac.

I propose that we remove Ubuntu for now and merge in this PR to get Windows working on CI. We could then release an RProvider v2.0.0-beta to allow wider testing of the .net 5 version. We could start a new PR to resolve the outstanding Ubuntu CI issues. What do you think?

Screenshot 2021-09-20 at 22 42 53

!

dsyme commented 2 years ago

I propose that we remove Ubuntu for now and merge in this PR to get Windows working on CI.

Sounds good!