TU-Wien-dataLAB / Grader-Labextension

BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Pull feedback does not work for some users #6

Closed meffmadd closed 5 months ago

meffmadd commented 11 months ago

Describe the bug Some users reported that they are unable to pull feedback. Upon further investigation we found out that the issue can be trace to git fetch --all behaving differently for user that reported the issue.

We can check what happens with GIT_TRACE=1 git fetch --all -v (or GIT_CURL_VERBOSE=1 GIT_TRACE=1 git fetch --all -v).

We can see that once rev-list gets called (correct) and once fetch-pack gets called (incorrect). What we could determine so far is that versions are the same, environment variables are the same, git config is the same, and initialization is the same (git init, git set-remote and then git fetch --all)

Relevant labextension code: https://github.com/TU-Wien-dataLAB/Grader-Service/blob/e112235976e4c5321d234f5202f07e974584fd9f/grader_labextension/grader_labextension/handlers/grading.py#L250

(Probably) relevant git code: https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/builtin/fetch.c#L2116

To Reproduce Could not be reproduced outside of user pods from affected users. Locally, everything behaves as expected but also inside unaffected user pods.

Expected behavior Git fetch should behave exactly the same (with the same image/versions etc).

adammccartney commented 11 months ago

@meffmadd looks like an interesting bug! Do you have an idea about how many users this affects?

Is there an expected state for the feedback/user/*/refs/heads directories? I'll do a bit of an audit on these and get back to you. In particular is there a reason why certain repositories would contain multiple heads?

adammccartney commented 11 months ago

Okay, as of this morning ca. 10am there are 46 .../*/*/feedback/user/*/refs/heads directories on the entire system that contain more than 1 file (i.e. feedback repositories with multiple heads). The total number of refs/heads directories is 838. So the other 792 feedback repositories all contain a reference to a single head. I'm not sure if this says anything interesting, as I don't know what the intended state of these feedback repositories is.

The vast majority of repositories with reference to multiple heads appear in the lecture that you mention in your initial comment. Here is a dump of all user feedback repositories with more than 1 head reference:

165.166-2023W 4
188.995-2023W 2
188.995-2023W 3
194.025-2023W 12
194.025-2023W 12
194.025-2023W 13
194.025-2023W 17
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 2
194.025-2023W 3
194.025-2023W 3
194.025-2023W 3
194.025-2023W 3
194.025-2023W 3
194.025-2023W 3
194.025-2023W 30
194.025-2023W 4
194.025-2023W 4
194.025-2023W 4
194.025-2023W 5
194.025-2023W 6
194.025-2023W 6
194.025-2023W 7
194.025-2023W 7
194.025-2023W 7
194.025-2023W 8
194.025-2023W 8
194.039-2023W 2
194.039-2023W 2
194.039-2023W 2
194.039-2023W 2
194.039-2023W 2
194.039-2023W 2
194.039-2023W 3
jaas30 2

There's a bit more data in the logs that I collected, can email this if it is of interest.

meffmadd commented 11 months ago

As far as I can tell the feedback repositories are all valid. I could clone on the feedback inside the grader service pod and could see the feedback files (also from affected users). So far the bug was confirmed for 2 users. Several others (also in the same lecture) could pull their feedback without any problems. Also, I made a submission and could pull the feedback.

I think there are 3 options:

  1. There is a bug in git and that causes it to make incorrect calls. (highly unlikely)
  2. There is an edge case in git that forces it to behave this way. (What is the edge case and how can we avoid it?)
  3. This is completely normal git behavior and it is forced to behave this way (again what is the reason) and our server is implemented incorrectly. Our git server code was taken from a template and is (was) the standard implementation as far as I can tell. It never led to any problems so far at least.
adammccartney commented 11 months ago

Yeah, so I think 2. and 3. offer an opportunity to improve our understanding of what is happening in our code. What was the original template for the implementation? Could we possibly document this somewhere? (an architecture decision record would be nice)

Okay, so the error posted above seems to be with the connection.

fatal: the remote end hung up unexpectedly

Apparently this might have something to do with a buffer overflow: https://stackoverflow.com/questions/15240815/git-fatal-the-remote-end-hung-up-unexpectedly

Can you email me the user ids of the two user that this affects and I will check the size of their feedback repositories?

adammccartney commented 11 months ago

Okay, so there are plenty of users with larger feedback repositories. So the size doesn't seem to be the issue

meffmadd commented 11 months ago

Yes, so I think the problem is not with the remote repositories. This should probably work since we are accessing it in the same way (regardless of source, release, assignment, etc.).

Also, we increased the buffer size without any effect. We are not cloning or pulling anything just fetching the changes.

meffmadd commented 11 months ago

We can also test the git calls directly in the grader service pod which results from the git fetch -all command in the server extension:

Btw, this command "unknown" error is just because of tornado. https://github.com/tornadoweb/tornado/blob/a6dfd70d7a6398b2187021df63ca77daa5781e5e/tornado/process.py#L300

meffmadd commented 11 months ago

For me, it seems the most promising path forward is finding the if statement in the git source code that causes the git client to issue different requests to the git server. From there, we could find the cause and understand what is happening. We can't even reproduce this bug right now.

adammccartney commented 11 months ago

Okay, so I examined the local repositories for all users (4501 in total). The user with the first ID that you send has multiple refs.

feedback/62/1712/.git/refs/heads:
total 2
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:43 eedback_8f686c3d1ea379c9c60d8d477cfb41e8f72f4468
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:43 feedback_8f686c3d1ea379c9c60d8d477cfb41e8f72f4468
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:40 main

All three files contain the same sha1 hash:

[root@cephfs 194.025-2023W]# cat feedback/62/1712/.git/refs/heads/*                                                                     
ad9f3a03a3bc9e48dbf5a262b93e2da679edb0de
ad9f3a03a3bc9e48dbf5a262b93e2da679edb0de
ad9f3a03a3bc9e48dbf5a262b93e2da679edb0de

That's literally the only place in over 4000 repositories where there is more than a single reference.

Can we even reproduce the bug for this one user?

meffmadd commented 11 months ago

Wait, so what does this mean? And yes, the bug can be replicated. The user has created new submissions whose feedback also cannot be pulled. And what about the second user with the bug?

Some initialization errors could lead to the different behavior of the git client since git makes a fetch call first and on the second call it differs. This git fetch --append --no-auto-gc ... call is the first time it interacts with the remote. I think last time I checked the result of the first call was similar (ofc different refs between the users).

meffmadd commented 11 months ago

Checked again and calling git upload-pack --stateless-rpc --advertise-refs "/var/lib/grader-service/git/194.025-2023W/62/feedback/user/e12******" in the grader pod results in the following output.

The output from my (working) feedback repo looks like this:

01140bb892a542988b8843bd95717622be29e7f2fa71 refs/heads/feedback_73fbc7d213964e5c54b9a800e7450e33f6d728e6multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done agent=git/2.25.1

I only made 1 submission and have 1 feedback.

Buggy repo:

01149edd6d9963f0d974125748340e22928e52562068 refs/heads/feedback_18554ac5cd5da9b2e19c63add841015b9aee60b0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done agent=git/2.25.1
006aecacce664770baeb2cdf3cf0c8a911958b37f23a refs/heads/feedback_1f3a0d681dcb6f1f2aa3d64a74674411acc96485
006a602bf731e53ef0a78d2a194ec98d66ec35809f19 refs/heads/feedback_208e5bcc60d13bd390464cd02149aa094f462568
006ae436a1d952fb56cabd84ca929648ea6832398d5a refs/heads/feedback_20e64efcd160c48cd97e13d406d5f9d1bb979fc6
006a422559bcf9c185d1101bf836ba381cfa16a70cf0 refs/heads/feedback_27f37005608c1f5daae900d27b83db439a624434
006a5d5ea15d7409364eb9ebce7c8ca8f1a7ba084978 refs/heads/feedback_2a9e3fc0018331e535ff24ace1d55e2d5efef1d1
006a2e75d79382f6555b8009cac2ecbd297557ae0032 refs/heads/feedback_2c1469c56d70a7a3022d8272e1c5f537142a4492
006a498716989338bd05c0f5b892e73f7ec84d50929f refs/heads/feedback_33483ae90c9eaffa70e156cf05a66f88b487b4b0
006a18e7e7f8e75aa59944a996a1b4bc18aef1fa8a1f refs/heads/feedback_3db45bbb5e03d216f4fb011ffb42e5f3218cc259
006a37f550d441d22ba3cf6c058509c2f03d634c3263 refs/heads/feedback_4554f39e038f6422d50a3c63e1b6547bee7347d3
006a48270f2e7e15a59101ad5c8f2cff99c2250a5ba6 refs/heads/feedback_520776f550731b8fea7d35244de3c9af5b7f9336
006a3745d7b97762b17b8597d61a0a0769926c6f95a4 refs/heads/feedback_55bffcd9a87736134ae3a3b9ba8e654a7c56c306
006a68be549ed89c273fec104c4d1d6f88189db18ff0 refs/heads/feedback_883f4cd695f100e2a7bb45b93b5645b446ae96b4
006aad9f3a03a3bc9e48dbf5a262b93e2da679edb0de refs/heads/feedback_8f686c3d1ea379c9c60d8d477cfb41e8f72f4468
006a28e32f472611b3195febb8dddd920ac344650995 refs/heads/feedback_9a57044ec22ea414e27f792155be96656b8eab82
006a42071733015bdc79d4396fb1990773f26a83a669 refs/heads/feedback_9fa22fc136c47fc4f7c6a08eac630c0fc24dd4a0
006a6c85dd59eeaabbcf70ce7ea7682f744f1dd850b9 refs/heads/feedback_a0d8f6d5d08fa3dacd69e70619aef2667abf020f
006a02d41763a1e9a3405d597b01d9c037744362abe8 refs/heads/feedback_af074a4ef2c48ab5df9f50562c5c166f28f55c3c
006a2ff1768eaae8ce84740146935ada434dc8ea9f2d refs/heads/feedback_b27748ecd7ae18e143d1376c0012dc8640cb4b88
006ae5c29100ca8565f961d627eab747a3aca5425022 refs/heads/feedback_c06e6720f71ed0298a734ab9ad6a849330257635
006afb3c01e156a6d1bc0fada151eda02c6fd59720ee refs/heads/feedback_caa1e0fb9c73a015f3f978f020cbf30bc6604e1e
006a4a132e3c7672e9360dd6b7c7621a0a954ea8079c refs/heads/feedback_efab287180b91c5ca3ae2c7bb9ec926dc48dacdb
006a1e73cf8939d21e35cbaf4fc36ab496eb171971b0 refs/heads/feedback_fb00e0e5a2a627eedc39f3b24480b21746bf6370
006ab40df7f605af4c13828effabac3f3a398609c4e6 refs/heads/feedback_ff42bfad60c5f37a93de92ff98de35ef5e863fd6

This seems reasonable to me and matches the number of submissions of the user. Also, this multi_ack thin-pack ... instructions or whatever this is is identical to my repository...

Somehow from these two results, the git client decides to issue different requests to the git server for the next step.

florian-jaeger commented 11 months ago

Okay, so I examined the local repositories for all users (4501 in total). The user with the first ID that you send has multiple refs.

feedback/62/1712/.git/refs/heads:
total 2
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:43 eedback_8f686c3d1ea379c9c60d8d477cfb41e8f72f4468
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:43 feedback_8f686c3d1ea379c9c60d8d477cfb41e8f72f4468
-rw-rw-r--. 1 258055 258055 41 Nov  7 12:40 main

The refs were manually created by us on Nov 7 to test the issue, therefore they are not a side effect or part of the bug.

adammccartney commented 11 months ago

I have a question: what is the relationship between the user's local feedback repository and the remote feedback? Within the directory /home/<user>/<lecture_id>/feedback/<assignment_id>/ there appear a number of sub directories, each with an integer id, is each one of these a submission id? If so, are we using the id to pull feedback for this submission? If that's true then I think we might need to check that we correctly handle the case that there is no feedback for a submission.

I may have managed to reproduce the bug. Step 9 below uncovers a question as to whether the empty directories are causing some unforeseen behavior? Basically, it appears to me that it's no problem to pull feedback directories

Observed behavior

  1. log in as user "e121..." to "194.025-2023W"

  2. navigate to "Assignment 1"

  3. The submissions list contains 24 user submissions, each with an "open feedback" button

  4. click the first "open feedback" button, continue and hit "pull feedback", this successfully pulls feedback (for submission with id 1500?). Log output below:

    [I 2023-11-09 15:15:26.496 ServerApp] 200 GET /user/e121.../api/contents/lectures/194.025-2023W/feedback/62/1500?content=1&1699542926469 (amccartn@10.0.0.21) 12.29ms
    [I 2023-11-09 15:15:26.496 ServerApp] 200 GET /user/e121.../api/contents/lectures/194.025-2023W/feedback/62/1500?content=1&1699542926468 (amccartn@10.0.0.21) 15.89ms
    [I 2023-11-09 15:15:26.498 ServerApp] 200 GET /user/e121.../api/contents/lectures/194.025-2023W/feedback/62/1500?content=1&1699542926469 (amccartn@10.0.0.21) 16.33ms
  5. repeat for the 2nd button - nothing happens, read the following in the logs:

    4148 [I 2023-11-09 14:42:34.299 ServerApp] http://grader-service:4010/services/grader/lectures/25/assignments/62/submissions/1710
    4149 [I 2023-11-09 14:42:34.315 ServerApp] Path: /home/e121...lectures/194.025-2023W/feedback/62/1710
    4150 GitError:
    4151 error: remote feedback already exists.
  6. Test with further buttons, it seems like it's possible to pull for some and not for others.

  7. Examine the filesystem

    
    # tree /mnt/cephfs/home/e121.../194.025-2023W/feedback/62
    /mnt/cephfs/home/e121.../194.025-2023W/feedback/62
    ├── 1492
    ├── 1500
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1511
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1512
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1514
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1515
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1552
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1569
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1570
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1573
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1611
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1652
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1658
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1661
    │   ├── Assignment_1.html
    │   └── errors.png
    ├── 1665
    ├── 1666
    ├── 1668
    ├── 1676
    ├── 1683
    ├── 1692
    ├── 1710
    └── 1712
    ├── Assignment_1.html
    └── errors.png

22 directories, 28 files

9. Question: are the errors associated with the empty directories? 
10. Randomly click buttons until we hit an error. Last item in submissions list produces and attempt to get 1492

[I 2023-11-09 15:03:29.715 ServerApp] Path: /home/e121.../lectures/194.025-2023W/feedback/62/1492 GitError: error: remote feedback already exists. [E 2023-11-09 15:03:29.825 ServerApp] Uncaught exception GET /user/e121.../grader_labextension/lectures/25/assignments/62/grading/1492/pull/feedback (10.0.0.21)

meffmadd commented 11 months ago

The errors are associated with the remote and is not specific to the feedback repos. We need to update the remote for each git interaction because we also include the token in each remote URL (user might have just logged in and their old token is expired). If no remote exists we add it with git remote add .... This works if actually no remote is set, but fails if we already have an (old) remote URL. Then adding a remote fails because it already exists and we need to use set-url. This is what this error is about. So this happens if you click on pull several times.

Please check the code: https://github.com/TU-Wien-dataLAB/Grader-Service/blob/e112235976e4c5321d234f5202f07e974584fd9f/grader_labextension/grader_labextension/services/git.py#L85

But it would be good news if you could actually pull feedback for a buggy user now.

adammccartney commented 10 months ago

Yeah, so it appears to me as if I can pull (some) feedback items as the buggy user.

However, I'm a little confused about the workflow generally. I found the UI pretty non-intuitive yesterday when I tried to reproduce the error. In particular, I found the experience of creating feedback as an instructor very confusing. In particular I was unable to disentangle the notion of automatically generated feedback from the grader and feedback in the form of comments from the instructor.

I had a quick glance at the documentation this morning, there is some reasonable documentation of feedback for student users. For instructors, there are a few passing references on feedback generation that are context dependent and pretty uninformative. For developers, there does not appear to be any documentation outside of the docstrings in the code.

https://grader-service.readthedocs.io/en/latest/search.html?q=feedback

adammccartney commented 10 months ago

(Probably) relevant git code: https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/builtin/fetch.c#L2116

You gave me the idea to query the git log. As you rightly pointed out, the git source code is relatively complex these days. The history of the repository is often quite informative. To see the history of the fetch command, you can run:

git log --reverse -L ":cmd_fetch:builtin/fetch.c" 

I also just wanted to see where fetch-pack turns up in that history:

git log --reverse -L ":cmd_fetch:builtin/fetch.c" --grep "fetch-pack"

The most recent commit there actually throws up some possibly helpful information. The fetch --refetch option might be what we want to use (as it seems we cannot trust what we have in the local object store).


commit 0f5e8851737282c9dd342032fe9a2d8b10367c9a
Merge: 1b54f5b89a 4963d3e41f
Author: Junio C Hamano <gitster@pobox.com>
Date:   Mon Apr 4 10:56:23 2022 -0700

    Merge branch 'rc/fetch-refetch'

    "git fetch --refetch" learned to fetch everything without telling
    the other side what we already have, which is useful when you
    cannot trust what you have in the local object store.

    * rc/fetch-refetch:
      docs: mention --refetch fetch option
      fetch: after refetch, encourage auto gc repacking
      t5615-partial-clone: add test for fetch --refetch
      fetch: add --refetch option
      builtin/fetch-pack: add --refetch option
      fetch-pack: add refetch
      fetch-negotiator: add specific noop initializer
meffmadd commented 10 months ago

Yes, in the future we should definitely check our git workflow. For our use case, we should only need the most rudimentary git features.

As a fix to this issue, we should just omit fetch and pull the branch directly (git pull feedback feedback_<commit-hash>). This is a workaround but is actually just a cleaner solution since are we fetching something that we already know exists. Luckily, the pull feedback functionality is in a separate handler so this change does not affect any other part of the git logic. We tested this behavior and it works also for users with this bug.