berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
62 stars 37 forks source link

Request more RAM for class ESPM-157 #2784

Closed cboettig closed 3 years ago

cboettig commented 3 years ago

Which hub do you want more RAM on?

r.datahub

Which class is this request for?

https://classes.berkeley.edu/content/2021-fall-espm-157-001-lab-001

How many students do you expect in this class?

54

How much RAM does this class need?

4 GB

Why does this class need this much RAM?

Students reproduce analyses of key global change papers, and need to be able to read in the original scientific data into R. In particular, we need to be able to load the RA Myers legacy stock assessment database and R isn't so friendly with low-RAM environments.

Any additional information we should know about?

balajialg commented 3 years ago

@cboettig Thanks for this request! What is the timeline you are looking at? How many weeks/months do you want to use the R hub with the increased RAM allocations.

Such requests have cost implications for us and hence wanted to get the duration details for this request.

cboettig commented 3 years ago

Thanks! We are using reasonably data-intensive exercises in the modules starting today through the rest of the semester (including the student final projects). I totally understand the cost restrictions, and I try pretty hard to keep the assignments within a 4 GB limit and not go over that, but really the difference between what students can do in 4 GB and 1 GB is very substantial.

Please let me know if that's not feasible and we can try and explore other hosting options; e.g. whether we can deploy your kubernetes system on our own hardware -- we already do this with self-hosted GitHub Actions since the free GitHub Actions for Educational accounts is just way too low to be practical in a large class, even with relatively minimal checks.

balajialg commented 3 years ago

Hi, @cboettig Thanks for the detailed rationale for making this request! Are you foreseeing this requirement across subsequent semesters for your courses?

We don't have a policy around such requests, and we would like to use your request as a model to define our policy. We should be able to get back in a day or two with regards to this request. Would that timeline work for you?

cboettig commented 3 years ago

@balajialg Thanks. Yes, if possible we'd like to keep being able to use r.datahub for the course going forward as well. We used r.datahub with 4 GB allocations last year during remote instruction (#1849) and it worked really nicely.

balajialg commented 3 years ago

@cboettig Thanks for sharing the PR! That's helpful to put things in context. I will keep your requirement for the R hub in mind as we respond to this request.

yuvipanda commented 3 years ago

Based on the metric in https://github.com/berkeley-dsep-infra/datahub/issues/2785, let's approve this.

cboettig commented 2 years ago

@yuvipanda @felder I have one student who is still not seeing the updated RAM allocation. What's the right protocol here?

balajialg commented 2 years ago

Hi @cboettig That's interesting! Can you share the email id of the student who is facing this issue?

cboettig commented 2 years ago

sure, do we have a private channel? (ianal but always think of FERPA when it comes to student info)

balajialg commented 2 years ago

That makes a lot of sense. Could you slack me in the UC Tech group or send an email to balajialwar@berkeley.edu?

cboettig commented 2 years ago

@balajialg :+1: sent you a slack DM

balajialg commented 2 years ago

@cboettig I accessed the user's instance and opened a python notebook (to check their allocated memory). I was able to confirm that they have 1 GB allocated memory (Ref snapshot). For cross-reference, I checked your Jupyter instance and found that you had 8 GM RAM allocated. My initial intuition is that this issue could be due to the fact that the student might not be part of custom profiles created. I will investigate this further till @yuvipanda can take a look at it tomorrow when he is up.

image

balajialg commented 2 years ago

@cboettig Looking at this documentation, I wonder If this student is officially enrolled in your class?

FYI, We fetch per-course enrollment from the Student Information System when we need to configure user servers based on course affiliations. We periodically use this to set per-user resource limits, attach extra volumes to user servers, and automatically add or remove admin roles.

cboettig commented 2 years ago

He's enrolled according to both my CalCentral roster and my BCourses roster.

yuvipanda commented 2 years ago

I've pinged @ryanlovett who wrote the SIS integration and knows way more about it than us. Would you be able to take a look?

ryanlovett commented 2 years ago

I checked this morning and it looks like the SIS is returning a non-berkeley.edu email address for this student. In the short term, @balajialg will manually specify this person in the secrets yaml. In the long term I will check to see if sis-cli can fetch the official email address rather than the preferred one.

cboettig commented 2 years ago

Thanks @ryanlovett !

ryanlovett commented 2 years ago

@yuvipanda The SIS enrollments API returns several email addresses for this student, "Campus", "Home", and "Other". None of them are their @berkeley.edu address, unlike most students. Their Campus address is actually from a different UC campus. Maybe this student altered their directory entry, perhaps due to privacy reasons. Since we do resource allocation based on the user's email address, there's not much that can be done with the current approach.

This hasn't been reported before so maybe it is fine to enter these one-offs manually.

If this does become a bigger problem, we could try making use of some other attribute returned by the CanvasOAuthenticator that happens to be present in the SIS enrollments data. The campus-uid (public, numeric) and "CalNet ID" (private, string) are in the enrollment data, but I don't know what is returned by the authenticator.

Of course long term there are plans to get the list of courses from canvas itself during auth.

cboettig commented 2 years ago

interesting. CalCentral shows me the student's @berkeley.edu email address though; I would have thought it also used the SIS data but I guess not?

ryanlovett commented 2 years ago

They might be collecting info from multiple places while we are using the enrollment API. I think @felder might be able to find out for us how CalCentral gets this. If we were to use some other API to get information about a student, I think we'd have to iterate over each enrollment to get the additional info. That'd mean the sidecar would take much longer (O(n) vs O(1)) to collect data for big courses when the hub starts up.

Another student has a "bekeley.edu" address for their Campus email, and a third has yet another UC campus .edu address for theirs. The typo makes me think that students can somehow customize their campus address.

balajialg commented 2 years ago

thanks, @ryanlovett for the detailed explanation! I have created a commit to add the encrypted calnet id of the student to the YAML file.

@cboettig Will create a PR and merge this soon!