berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
65 stars 39 forks source link

RStudio Initialization Error - Discussion and Workaround #1899

Closed cdbeon closed 3 years ago

cdbeon commented 4 years ago

Hello all,

I've had a few students come up to me with the following error when trying to log into their RStudios:

RStudio Initialization Error: Error occurred during transmission.

The first report that I got of this issue came early today at around 12 AM.

Restarting their servers (going to the admin hub, clicking stop server, then start server) didn't seem to work either.

Any suggestions as to how to resolve this issue?

Thank you!

felder commented 4 years ago

This same issue was also reported yesterday on r.hub on behalf of another student by @d-alex-hughes. @lytello also reported this.

It was one off at the time, but I'm worried this may not be a one off. If this becomes widespread, we'll need some instructions for how to reproduce so we can track down and fix the underlying cause.

Workaround until a fix can be identified and implemented

This can be fixed by renaming or removing ~/.rstudio via the terminal.

To do so while bypassing the typical rstudio session startup:

  1. Go to https://r.datahub.berkeley.edu/user-redirect/tree
  2. Click New->Terminal
  3. In the terminal, type: mv .rstudio .rstudio.$(date +%s) and press return
  4. Try to launch rstudio as you normally would and it should now work.
fdrennan commented 4 years ago

EDIT:

This fixed it for me: https://support.rstudio.com/hc/en-us/articles/218730228-Resetting-a-user-s-state-on-RStudio-Server

HOWEVER, the problem comes back almost immediately and the Preview version of RStudio is unusable for me.

Given that I run this frequently, I have some commands I've used to do this quickly without resetting my settings each time.

#!/bin/bash

# cp -r ~/.config/rstudio/ ~/.config/rstudiobak/ | echo failed
# cp -r ~/.local/share/rstudio/ ~/.local/share/rstudiobak/ | echo failed
# cp -r ~/.rstudio/ ~/.rstudiobak/ | echo failed

rm -rf ~/.config/rstudio/ 
rm -rf ~/.local/share/rstudio/  
rm -rf ~/.rstudio/

cp -r ~/.config/rstudiobak/  ~/.config/rstudio/ 
cp -r ~/.local/share/rstudiobak/ ~/.local/share/rstudio/
cp -r ~/.rstudiobak/ ~/.rstudio/

I'm here because I'm having a similar issue on RStudio Server.

Here's what I know. When I am using the preview version, I get the error. When I revert to the current release of RStudio Server, the error disappears. When reinstalling RStudio Server Preview (or daily build) the error returns. I am able to get to the log in but the error occurs after submitting my password and before the IDE appears.

felder commented 4 years ago

@lytello @d-alex-hughes @cdbeon Any other students report this issue? If not I may close this for now and reopen later if this appears to be a consistently occurring problem.

d-alex-hughes commented 4 years ago

Thanks for digging into this with the student who raised it @felder. 🎉 Their instance is running sans problems now and I haven't had any new cases bubble up in the last few days.

If it does rise up again, I'll bring it back into this issue and notify you.

lytello commented 4 years ago

@felder I had 5 more cases (likely more that didn't require my assistance). The ones that did reach out had lost their work. From my asking questions, these students weren't regularly saving their files when using datahub, which I suspect is why this additional issue occurred.

blulightspecial commented 4 years ago

Just had another MIDS student raise this issue on slack.

ericvd-ucb commented 4 years ago

Someone just reported this via DS-infrastructure email as well.

blulightspecial commented 4 years ago

@ericvd-ucb it is likely that MIDS student. I directed them to reach out via email.

cdbeon commented 4 years ago

@felder I've had 3 more cases since then

felder commented 4 years ago

@cdbeon any word on how to reproduce?

I'm really going to need some assistance with creating some sort of reliable set of steps to reproduce this issue in order to solve it.

cdbeon commented 4 years ago

@felder Not yet; I've been trying to ask students what they've been doing in their previous session(s), but I've had a mix of cases where students have been clearing the environment / using RStudios normally and those who have been overloading their global environment. All of them were doing the same thing -- working on the previous week's lab -- which is pretty uniform and shouldn't cause any problems (as evident by the 98% of the class who doesn't have the error).

In the meantime, I'll try to trigger the error myself and keep you updated!

felder commented 4 years ago

@cdbeon ok thank you!

felder commented 4 years ago

Ok so I've been playing around with this issue using docker on my workstation so that I have more debugging capability. Additionally I grabbed the entire home directory from a student who had previously experienced this issue.

Here's what I discovered so far:

  1. Merely copying just the .rstudio directory of a person experiencing this is enough to reproduce the issue. None of the other student's files are necessary.

  2. I was utterly unable to get any logging whatsoever to work out of rstudio-server. In fact it appears that the logging options are all largely restricted to rstudio server pro. However, I attached strace to the running rserver process and tried to launch a new rstudio session with the broken .rstudio in place. strace at least gives me a little idea of what the rserver process is trying to do.

  3. Looking through the strace output, I can find the last file in .rstudio that rserver attempted to access. It seems to be the "options" file for whatever session it is attempting to resume. For example:

grep ".rstudio" strace.txt ... openat(AT_FDCWD, "/home/rstudio/.rstudio/sessions/active/session-960e4455/suspended-session-data/options", O_RDONLY) = 7

If I remove that file, rstudio works and retains the info in the console.

The output in the console is whatever the student was doing last followed by something that looks like this: Error: C stack usage 7969504 is too close to the limit Error saving session (options): R code execution error

So I suspect that the error above results in the data for the file "options" being corrupted and rstudio gets mad when it parses it. Why this happens? I have no idea.

Googling C stack usage is too close to the limit does pull up this intriguing result: https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit

Additionally, googling: "error saving session" "r code execution error"

Also pulls up some stuff, but really nothing conclusive.

I'm hoping some instructors familiar with the course material may have some insight.

felder commented 4 years ago

Relevant commands:

docker exec --privileged -it --user=root ${CONTAINERID} /bin/bash --login strace -f -e 'trace=!clock_gettime,gettimeofday,futex,timerfd_settime,epoll_wait,epoll_ctl' -p ${PID}

yuvipanda commented 4 years ago

@felder wow, that's awesome debugging work! Thank you <3

In the meantime, I think we can just remove (or rename) all files in .rstudio that match this description on our NFS server. What do you think?

felder commented 4 years ago

@yuvipanda i have no idea what the result of doing that would be especially on active sessions or others not currently experiencing the issue. However, would it be possible to rig a url that provides a button that deletes these files (or renames .rstudio) when pushed?

yuvipanda commented 4 years ago

@felder looks like removing ~/.rstudio should be safe in our context - https://support.rstudio.com/hc/en-us/articles/218730228-Resetting-a-user-s-state-on-RStudio-Server.

I don't think it's quite possible to write a url for this, unfortunately. Users can visit classic notebook via https://r.datahub.berkeley.edu/hub/user-redirect/tree and maybe do it from there.

But, I think the right thing to do is to possibly remove the state files from inside ~/.rstudio for all users who aren't currently running. We should be able to get a list and do that. What do you think?

felder commented 4 years ago

@yuvipanda My belief at this time is that this may be caused by the fact that datahub and r hub both mount the same user filesystem but have completely different versions of R as well as different versions of various R libraries. For example in addition to using R 4.0.2, datahub uses the system packaged texlive libs. R hub uses texlive libs installed via tlmgr.

I do not believe simply deleting .rstudio for all users will provide a permanent fix for this. I think such a fix would only be temporary.

In my opinion we should remove rstudio from datahub and use a single hub for rstudio exclusively. Alternatively each hub needs its own config file location for rstudio if those hubs mount a shared filesystem. Another possibility would be to setup a new filesystem for r hub.

Based on this, I don't know if rstudio can actually be instructed to behave differently: https://community.rstudio.com/t/change-rstudio-from-rstudio-server/8248

Note also, that our setup is quite similar to the the setup that is being described as "not recommended." Basically we probably should do something to ensure that multiple instances/versions of rstudio are not competing with each other.

ryanlovett commented 4 years ago

@felder This makes a lot of sense! Can any of the recent reporters confirm that they've used R on both hubs recently? Can you reproduce if you open and close RStudio on one hub then open it on the other? Or if you run RStudio simultaneously on both?

Its not obvious we can configure the path to ~/.rstudio/ in a given user environment. We could do something hacky^Wclever and bind mount it elsewhere.

cdbeon commented 4 years ago

We've only been using r.datahub.berkeley.edu for our class!

felder commented 4 years ago

@ryanlovett I tried to reproduce by flipping back and forth between hubs with rstudio open, but could not. However, I don’t really have any code running and I did not do a lot of switching. Also, I did not try a combination of doing things like killing pods and starting new pods up.

@lytello @cdbeon do you know if any of your students that had this issue used rstudio in multiple hubs such as datahub/r hub ? Also any sense at this point if it’s a small or large percentage of students seeing it?

felder commented 4 years ago

@cdbeon any idea if any of your students may have other classes that use rstudio via datahub?

ericvd-ucb commented 4 years ago

@d-alex-hughes @blulightspecial @lytello @cdbeon I just wanted to comment on communications here - @Felder is working mighty hard to try to troubleshoot this one but its taking some time. For now there is a workaround mentioned above

We would love to get your help to communicate on this - could you please communicate out to your classes ... and for now save the ds-infrastructure email for instructor level communications? Thanks


Workaround until a fix can be identified and implemented This can be fixed by renaming or removing ~/.rstudio via the terminal.

To do so while bypassing the typical rstudio session startup:

Go to https://r.datahub.berkeley.edu/user-redirect/tree Click New->Terminal In the terminal, type: mv .rstudio .rstudio.$(date +%s) and press return Try to launch rstudio as you normally would and it should now work.

d-alex-hughes commented 4 years ago

I have switched between r.datahub and datahub instances pretty frequently, invoking rstudio instances from the main datahub by editing url, and haven't ever generated the error that triggered this round.

Generally, wiping the state space of anyone's instance at logout/timeout/spindown is consistent with practices in the community.

https://mobile.twitter.com/hadleywickham/status/1032665959734108160?lang=en

(But suggested elsewhere too.)

ipietri commented 4 years ago

@felder - Today I ran into the issue described here https://github.com/berkeley-dsep-infra/datahub/issues/1899#issuecomment-706386592

felder commented 4 years ago

@ipietri Any chance you can reproduce this by doing the same thing you were doing before the error occurred?

Also do you ever use rstudio in both datahub and r hub, or do you just use one hub?

felder commented 4 years ago

@d-alex-hughes yeah that's definitely a solution under consideration. However, if we go that route we need to do our best to communicate to students that they need to make sure they save their notebooks prior to logging off (we should encourage this anyway).

Additionally, if a student loses network connectivity and during that time their pod dies, they may also lose work.

ipietri commented 4 years ago

Hi, I wasn't doing anything really. It just happened when I tried to open the datahub in the morning today. I implemented the suggested solution (below) and is working now.

  1. Go to https://r.datahub.berkeley.edu/user-redirect/tree
  2. Click New->Terminal
  3. In the terminal, type: mv .rstudio .rstudio.$(date +%s) and press return
  4. Try to launch rstudio as…

On Wed, Oct 21, 2020 at 11:06 AM felder notifications@github.com wrote:

@ipietri https://github.com/ipietri Any chance you can reproduce this by doing the same thing you were doing before the error occurred?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berkeley-dsep-infra/datahub/issues/1899#issuecomment-713764805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXPLK3SVKHS6CZIOXB7IDDSL4PK3ANCNFSM4SKOXMAQ .

felder commented 4 years ago

@ipietri datahub or r hub, also do you ever switch back and forth while using rstudio?

ipietri commented 4 years ago

When I say datahub I mean this link ( https://r.datahub.berkeley.edu/user/isabelgarpietri/rstudio/), where I get access to RStudio. What do you mean If I switch back and forth? Like if I stop working there and then come back? If that is your question, yes I do that.

On Wed, Oct 21, 2020 at 11:48 AM felder notifications@github.com wrote:

@ipietri https://github.com/ipietri datahub or r hub, also do you ever switch back and forth while using rstudio?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berkeley-dsep-infra/datahub/issues/1899#issuecomment-713799288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXPLK5N5UQVFCYFVNQH4MLSL4UG3ANCNFSM4SKOXMAQ .

felder commented 4 years ago

@ipietri there are multiple hubs... https://r.datahub.berkeley.edu, https://datahub.berkeley.edu, https://data100.datahub.berkeley.edu, etc

Each hub has different configurations to serve different classes and use cases. Some students may be in multiple classes that utilize different hubs. It's possible to run rstudio in more than one.

cdbeon commented 4 years ago

Hello! Sorry for the delay in my response, been a hectic week.

Eric reached out to me via email, and I thought pasting my responses to his email would help:

1) How common is this - like 1 in 100, 10 in 100 - how many students are facing this?

I believe this issue is starting to become more and more common, around 10 in 100 students. About 30 students have come up to me (so far) with this issue.

2) Are people using the URL datahub.berkeley.edu or r.datahub.berkeley,edu

The class is using r.datahub.berkeley.edu, but I'm sure that some undergrad students are also using datahub.berkeley.edu for other classes. I'm not sure if they're using datahub.berkeley.edu to launch RStudios though; to my knowledge, not that many other classes use RStudios in the first place.

3) Does the fix proposed in 1899 work , or does the problem recur

For most students, the fix in 1899 works; unfortunately, I just had one student pretty recently (i.e. yesterday) who brought up the issue a second time (despite using the fix). I managed to just delete the copy (rm -r .rstudio.bak) and make a new one (mv .rstudio .rstudio.$(date +%s)) and it seems to work again. According to the student, they only use r.datahub.berkeley.edu and only for this class as well.

4) The proposed next step would be to clear all user sessions at logout - could that work for your users - or could you communicate that to your users ( eg save all work and logout at end of session)

We've been pushing this to students after every lab/every announcement, but of course you'll always have those students who never heed the warning. We'll continue to tell our students to save and logout at the end of every session, though!

yuvipanda commented 3 years ago

With #2035, we have separate .rstudio directories in home for datahub & r hub, but using the exact same image. This should help if the problem is two different R / rstudio versions sharing the same .rstudio file was the cause.

The other option to explore is to see if RStudio is being given a proper opportunity to shut down cleanly by jupyter-rsession-proxy, or if it is being killed straight up - that could also cause corruption.

Hopefully this is less of an issue this semester?

yuvipanda commented 3 years ago

I see no reports of this since #2035, so am gonna close this for now \o/. Please re-open if you run into this again.