github-education-resources / classroom

GitHub Classroom automates repository creation and access control, making it easy for teachers to distribute starter code and collect assignments on GitHub.
https://classroom.github.com
1.34k stars 563 forks source link

Teachers using private repo belonging to their user account #2222

Open EricPickup opened 5 years ago

EricPickup commented 5 years ago

A lot of teachers have been using private repos created on their user account (rather than on the org) as their starter code repo.

This isn't an issue when they create the assignment because we're querying that repo on their behalf. The repo is accessible.

However, when a student accepts that assignment, we pick a random org owner's token to create a repo on the classroom org. If that random token is not the teacher who owns the starter repo, the repo is inaccessible.

I'm guessing we never noticed this because most orgs/classrooms will only have one owner (aka the teacher that owns the private repo). In reality, many classrooms contain multiple owners (TA's for example).

This is the actual reason templates were failing for the past few weeks and most likely has been the biggest cause of failure using porter as well.

d12 commented 5 years ago

100/100 investigative work, nice work 😄

@EricPickup @shaunakpp I'd love for you to try to solve this in your last week - if possible. If you don't have time to implement anything, outlining a proposed fix for handoff to @education/classroom-reviewers to work on would be :100:

EricPickup commented 5 years ago

@shaunakpp let's set up some time today to work on this!

A couple solutions I've thought about..

  1. When a new assignment is created or a new teacher/TA is added as a classroom owner, add them as collaborators to any starter code repositories. We'd definitely add some message to indicate this. We could check who created each assignment, grab their token and add all other classroom owners. This would help prevent reaching rate limits in our largest classrooms.

  2. When creating a repo for the student, iterate through (or choose at random) the list of classroom owners and send a GET for the starter code repo on their behalf. If we get a 404, skip to the next owner. Once we get a successful response use their token. This is the safest way to do it and requires no unexpected changes to the teacher's repo. However, this does leave a higher chance of hitting the rate limits on large classrooms.

@d12 @shaunakpp let me know what you guys think.

shaunakpp commented 5 years ago

When a new assignment is created or a new teacher/TA is added as a classroom owner, add them as collaborators to any starter code repositories.

This may not work if the starter code repository is a public repository not belonging to the teacher or the organization.

When creating a repo for the student, iterate through (or choose at random) the list of classroom owners and send a GET for the starter code repo on their behalf. If we get a 404, skip to the next owner. Once we get a successful response use their token.

This makes an already long-running job longer, is it possible to short-circuit this step while assignment creation? Like, keep a list of user_ids that are valid owners/collaborators for the starter code repository and use one of them while creating the assignment repo for a student?

I'm also in favor of the second solution as it is not mutating anything on dotcom.

EricPickup commented 5 years ago

This may not work if the starter code repository is a public repository not belonging to the teacher or the organization.

This step wouldn't be necessary if it's a public repo!

keep a list of user_ids that are valid owners/collaborators for the starter code repository and use one of them while creating the assignment repo for a student?

The only issue I see with this is that even though it's unlikely collaborators for the repo can change at any time. It also still leaves the rate limit issue. 😬

I think we should wait to see what @d12 thinks before we get started since both methods have their ups/downs.

BenEmdon commented 5 years ago

Awesome job investigating this 👌 This was totally an oversight!

Issue with Proposed Solutions

Both suggested solutions 1. and 2. in https://github.com/education/classroom/issues/2222#issuecomment-520513631 rely on users maintaining access to their classroom organization. This would prevent classroom organizations form using GitHub organization access as they might normally.

If a ta/teacher is removed from their classroom organization, classroom would loose access to their privately owned starter code repos.

Users of classroom would be inconvenienced by this constraint (which could be perceived as unintuitive). Organization access should be able to be revoked for teachers/ta without causing disruption to classroom.

Potential Solution

A more stable solution would to copy the assignment creator's privately owned starter code repo to the classroom organization. This copy would serve as a base repo for the assignment and classroom would always maintain access to it (given that the classroom admins have the appropriate levels of access).

This approach has some bonus benefits:

shaunakpp commented 5 years ago

This step wouldn't be necessary if it's a public repo!

Yes, but we currently store just the starter_code_repo_id so we wouldn't know whether the repo is public or not until we start the assignment creation process for a student. So, probably we'll need to add a flag like starter_code_repo_private to circumvent this added check if the repo is public.

The only issue I see with this is that even though it's unlikely collaborators for the repo can change at any time. It also still leaves the rate limit issue. 😬

We can keep user_ids as cache, and bust the cache if we get a 404, and retry the process. (This is a rare scenario, but worth handling).

EricPickup commented 5 years ago

@BenEmdon I really like this solution and think it's the most stable.

We could fork the repo over to the organization instead of copying using templates. That removes the extra step of making that repo into a template and also cover the plan that @d12 proposed related to this: https://github.com/education/classroom/issues/2032 where if the user doesn't have permission to make a repo into a template, we fork it and make the fork a template.

shaunakpp commented 5 years ago

@BenEmdon creating a fork looks like a good solution 👍.

We'll need to add clear messages that we are creating a fork in the organization, as currently picking a starter code repository creates no perceived side effects to the organization. But, with this implementation, we'll be adding a new repository to the org and at the same time make it a template.

d12 commented 5 years ago

This makes sense to me :+1:

As far as copying goes, we should be forking, not importing or template copying. The repo may not necessarily be a template repo, and we don't want to add further dependencies on the importer.

Keep in mind that GitHub forks are absurdly complex, there's a LOT of edge cases to worry about. @BenEmdon 's point about not having to keep a seat in a paid org is depending on a bug that we've been trying to fix for 3 years now: https://github.com/github/github/issues/62000, so lets not depend on that always being the case.

That being said, make sure to test any assumed fork functionality before starting work, forks sometimes don't behave how you'd expect them to behave.

Let's also make sure we're looping in design here so that this is communicated properly to the end user, Teachers should understand why this repo is being copied into their org.

EricPickup commented 5 years ago

The forking can solve a lot of our issues such as this one, #2234 and #2032 but I've just started working on it and am noticing some concerns:

cc @femmebot on all of this stuff!

d12 commented 5 years ago

1) Yep, we should check if the fork exists. Make sure we don't make naming assumptions here since forks aren't guaranteed to be named something specific.

2) @femmebot May have ideas for how we can design this. Super savvy users may be able to figure this out themselves, but let's guide the users from the Classroom UI.

jeffrafter commented 5 years ago

Although it solves a variety of problems, relying on forking makes me generally uneasy and I tend to agree with @d12 that it could introduce more problems, including possibly:

I think a lot of similar problems exist with the template scenarios as well.

If we build a matrix of the cases we'll see which ones don't work (assuming that the repo in question is marked as a template the following should be true):

Owner Public/Private Students can see/use the template
Teacher Public Yes
Teacher Private No
Classroom Org Private Yes
Classroom Org Public Yes
Third party Org[^1] Private No
Third party Org Public Yes
Org with 3rd Party Access Restrictions[^2] Public No
Org with 3rd Party Access Restrictions[^2] Private No

[^1]: assuming the student is not a member of the third party org [^2]: assuming that the org has not explicitly enabled GitHub Classroom

If this table isn't true then I think it represents a bug in GitHub we should fix.

If this table is true then I think we should look at a couple of smaller interventions:

If those conditions aren't met, the teacher could still fork (manually) to make it work.

Cons:

EricPickup commented 5 years ago

FYI @jeffrafter we should add this row to the table due to #2234:

Owner Public/Private Students can see/use the template
Third party org[^2] Public No (can see, cannot use)

I agree with @jeffrafter, I don't think this forking solution is the way to go anymore in light of the barriers we keep finding with forks.

I think splitting the starter code repository from the teacher inputting it (their own repo) to a different repo adds a lot of confusion. If the repo is already on the org we couldn't fork it and would reference the original repo, which adds inconsistency to this process.

  • If private, check the owner (it must be the classroom org)

If the owner is a user (and private) we could add a button to transfer the repo to the org through classroom. They'd still have to manually accept the transfer on the org side through the web UI but we could provide a link to their org to accept it easily.

If someone changes the visibility of the repository after the assignment is created, students could still encounter bad states

We could look for 404s when creating the student repo and give a detailed error message: We could not find the starter code repository. This is most likely because the repository was made private and not all of the organization owners have access to it. If you are the teacher, you should make the repository public or transfer the ownership to the classroom organization.

Keep in mind for this issue to occur, the classrooms needs to:

  1. Have multiple organization admins
  2. Use a private personal repository as starter code

I think adding the forking idea would add a lot of confusion to all assignments, just to fix a specific scenario. I prefer the other method of enforcing private repos belonging to orgs and flashing a detailed message if they change the visibility after. This adds a little extra work (but not unexpected) for teachers in this specific case rather than a lot of confusion to all.

jeffrafter commented 5 years ago

If the owner is a user (and private) we could add a button to transfer the repo to the org through classroom. They'd still have to manually accept the transfer on the org side through the web UI but we could provide a link to their org to accept it easily.

Oh, I really like that idea. We could even make this button just open GitHub's transfer page.

We could not find the starter code repository. This is most likely because the repository was made private and not all of the organization owners have access to it. If you are the teacher, you should make the repository public or transfer the ownership to the classroom organization.

💯 Better error messages are so helpful

d12 commented 5 years ago

I'm :-1: for one major reason: If I'm understanding correctly, this solution doesn't let us share starter code repos among classrooms if they aren't in the same org. Teachers very often share starter code repositories among classrooms in different organizations, often having a shared faculty organization with these shared teaching resources. or, teachers are free to use starter code repositories shared from other teachers. Requiring transferring ownership to use a starter code repo goes against the idea of "shared teaching resources" that I'd love to move towards in the future.

jeffrafter commented 5 years ago

@d12 that makes sense:

Teachers very often share starter code repositories among classrooms in different organizations, often having a shared faculty organization with these shared teaching resources. or, teachers are free to use starter code repositories shared from other teachers.

In those cases are the students typically members of the third party org or are the shared repositories public?

d12 commented 5 years ago

Not based off of what I've seen, teachers often have "teacher only" orgs for sharing internal resources.

d12 commented 5 years ago

Make it possible to use any repo as a template via the API (this is potentially controversial)

If this could happen, it'd solve all our problems 🤞

EricPickup commented 5 years ago

@d12 that's a good point. What do you think about just providing detailed failure/error reports to the user like I mentioned in https://github.com/education/classroom/issues/2222#issuecomment-521460371? I personally think it's still worth providing a failure reason for this specific case rather than confusing the situation for all cases with forks.

d12 commented 5 years ago

@EricPickup The failure messages would be good, but if they're surfaced when students accept a repo and not when teachers create the assignment, it's going to cause a lot of trouble for teachers.

EricPickup commented 5 years ago

Update: I've put up a PR for a back-end solution to this in #2286 and have proposed a front-end method of preventing this in #2285.

andrewbredow commented 4 years ago

Possible solutions:

  1. We could work around rate limiting via a GitHub App.
  2. Or, warn teacher that it's a private repo

Maybe we ask teachers to fork or move their repo into the organization

We can tell which repos which repos will run into this

  1. Is the repo private
  2. Does the organization have more than one admin

Precursor:

  1. Measure how often teachers are using private repos where the teacher is the owner and it's not in the org? Let's create a dump of historical data for this.
  2. Design: Do we want to fork it for them, or do we want to direct them to GitHub to complete the action.
spinecone commented 4 years ago

Some preliminary data gathered from the GitHub Classroom production console and octokit (limited it to just the past month for now because this query takes a long time to run):

(script to generate this data: https://gist.github.com/spinecone/9f6517129e06c0d2e40a16301d433878)

spinecone commented 4 years ago

Followup steps: