update_manager: git repo validation is done on stale state, results in exceptions

kdomanski commented 4 months ago

What happened

I noticed that for some repos updated with update_manager, I get the following stacktraces in logs:

024-02-23 15:16:57,415 [git_deploy.py:refresh_repo_state()] - Git Repo klipper: Initialization failure
Traceback (most recent call last):
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 349, in refresh_repo_state
    self.upstream_url = await self.remote(f"get-url {self.git_remote}", True)
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 840, in remote
    self._verify_repo(check_remote=validate)
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 792, in _verify_repo
    raise self.server.error(
moonraker.utils.ServerError: Git Repo klipper: No valid git remote detected
2024-02-23 15:16:57,415 [git_deploy.py:refresh()] - Error Refreshing git state
Traceback (most recent call last):
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 64, in refresh
    await self._update_repo_state()
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 71, in _update_repo_state
    await self.repo.refresh_repo_state(need_fetch=need_fetch)
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 349, in refresh_repo_state
    self.upstream_url = await self.remote(f"get-url {self.git_remote}", True)
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 840, in remote
    self._verify_repo(check_remote=validate)
  File "/home/pi/moonraker/moonraker/components/update_manager/git_deploy.py", line 792, in _verify_repo
    raise self.server.error(
moonraker.utils.ServerError: Git Repo klipper: No valid git remote detected

On top of that, manually triggering the repo update in UI never seems to really refresh the repo state. I see warnings that are out-of-date.

After digging into the code, I determined the following:

In GitDeploy.initialize(), the repo state is restored from storage
Then its validity is checked based on that (possibly stale) state
The repo is then treated as a valid repo, refresh is attempted
GitDeploy._update_repo_state() wants to run GitRepo.refresh_repo_state() and then verify the result with repo.is_valid()
But _update_repo_state() just throws an exception
The trigger appears to be the fact, that I checked out a branch other than master
Because of that, update_manager tries to get a branch-specific origin URL from storage, ends up with ?
There's a check in _check_warnings() for this exact condition: if self.git_branch != self.primary_branch:
But refresh_repo_state would only run this waaay after the exception is thrown
The only time _check_warnings is ran is that initial check based on stale state
In the past, this condition was handled cleanly in repo.report_invalids()
Regression introduced in commit a7b9e5783de921c7b071ceec726eb403915ab34c

Client

Fluidd

Browser

Firefox

How to reproduce

cd ~/moonraker
git switch -C some_new_branch
sudo systemctl restart moonraker
cat ~/printer_data/logs/moonraker.log

Additional information

No response

Arksine commented 4 months ago

Can you attach a log reproducing? The line numbers in the stack trace don't match the current repo. The error is indeed occuring because self.git_remote is set to "?", but the state shouldn't be stale. The remote and branch are set in _find_current_branch(). The remote itself is set by querying the configuration based on that branch, or its parsed from the response to git branch if the head is detached. If you switch to a branch that has no upstream remote set in git config then an exception will correctly be raised.

FWIW, the update manager isn't really intended to be used in this manner. Its primary purpose is to provide updates from a remote on a static branch. Manually modifying the repo could certainly confuse it.

kdomanski commented 4 months ago

Wow, thanks for the super-quick response!

Can you attach a log reproducing? The line numbers in the stack trace don't match the current repo.

Sorry, I've been working with a codebase riddled with extra debug log statements. Here's a full log from unchanged code: moonraker.log

Turns out, simply restarting is not sufficient to reproduce from clean state. I can get it to throw the exceptions when I force a refresh, but that alone doesn't save the broken state. Race condition maybe?

The error is indeed occuring because self.git_remote is set to "?", but the state shouldn't be stale. The remote and branch are set in _find_current_branch(). The remote itself is set by querying the configuration based on that branch, or its parsed from the response to git branch if the head is detached. If you switch to a branch that has no upstream remote set in git config then an exception will correctly be raised.

Sure, I get that, but it seems to have worked more cleanly in the past, i.e:

this condition (self.git_remote == '?') would only result in error message
the other checks would be performed
full validation response would be returned to a calling client

Now, no information about the actual problem shows up in Mainsail or fluidd - only the recovery button is highlighted and the info message remains stale:

Screenshot 2024-02-23 174418

Screenshot 2024-02-23 174735

FWIW, the update manager isn't really intended to be used in this manner. Its primary purpose is to provide updates from a remote on a static branch. Manually modifying the repo could certainly confuse it.

It didn't occur to me that I'm doing anything unusual. I just had it running while working on some Klipper code in a branch. I know you only have so much time to work on this, so I'd be happy to fix this regression, unless you say "nah, I'm familiar with the codebase, I already know how to fix it".

It's also worrying that the broken status was somehow saved and prevented a refresh even after switching back to master. Even if you decide that update_manager must not be used with development repos, then there's a potential problem somewhere in there.

Cheers KD

kdomanski commented 4 months ago

BTW I still have one more corrupted repo state in storage, so I backed up the database in case it would help with debugging.

Arksine commented 4 months ago

I don't think this is a bug or a regression. Switching to a new branch that has an untracked remote would have still raised an exception prior to https://github.com/Arksine/moonraker/commit/a7b9e5783de921c7b071ceec726eb403915ab34c. Simply put, the update manager can't fetch the updates from a remote when there is no remote set in the git config. The way to correct the issue is to switch to a branch that is tracked, or push the branch to a remote.

Arksine commented 4 months ago

FWIW, i just verified. I was able to switch to a new branch on a configured repo and and reproduce the error. After switching back to master a refresh correctly restores state.

The behavior from Moonraker is slightly different from https://github.com/Arksine/moonraker/commit/a7b9e5783de921c7b071ceec726eb403915ab34c, but both would have raised exceptions. In the current version the exception is raised when Moonraker attempts to verify the repo. The key branch.<branch_name>.remote does not exist in git config. As a result, Moonraker sets the remote to "?", as its unknown. When Moonraker validates the remote before performing a fetch it raises an exception.

In the linked commit the exception would have occurred here. In this version the call to git config would have returned status code 1, which would have raised a shell command error.

One other thing I forgot to mention, the update manager's status endpoint has a basic "spam" filter that simply returns the current state if a new request is received within 60 seconds of a fulfilled request. Generally speaking refreshing the state isn't something that needs to occur often, and it tends to be CPU intensive (particularly on low resource machines). The filter exists to prevent several frontends/apps from connecting and requesting a status refresh all at once.

That said, Moonraker also registers the POST /machine/update/refresh endpoint that has no such filter, however it can be used to request a refresh for a particular item, ie POST /machine/update/refresh?name=klipper. I don't believe the primary front ends ever implemented it though.

kdomanski commented 4 months ago

Hey, when I wrote that it was introduced in a7b9e5783de921c7b071ceec726eb403915ab34c, I meant that the correct behavior is before it, meaning 35396a5b2a2d4b5781c9d89007f6278b61edcd8d.

But it appears the behavior on startup in 35396a5b2a2d4b5781c9d89007f6278b61edcd8d seems to be the same. What stands out is that in 35396a5b2a2d4b5781c9d89007f6278b61edcd8d the function _update_repo_state() runs report_invalids( ) which sets _is_valid to False and exits gracefully. Meanwhile a7b9e5783de921c7b071ceec726eb403915ab34c _update_repo_state() doesn't do that.

Still, the corruption of the saved state is freaky, but I cannot reproduce it at this time. Has to be a race condition of some sort. 🤔

Arksine commented 4 months ago

Hey, when I wrote that it was introduced in https://github.com/Arksine/moonraker/commit/a7b9e5783de921c7b071ceec726eb403915ab34c, I meant that the correct behavior is before it, meaning https://github.com/Arksine/moonraker/commit/35396a5b2a2d4b5781c9d89007f6278b61edcd8d.

Right, my mistake. As you stated, the behavior is the same.

What stands out is that in https://github.com/Arksine/moonraker/commit/35396a5b2a2d4b5781c9d89007f6278b61edcd8d the function _update_repo_state() runs report_invalids( ) which sets _is_valid to False and exits gracefully. Meanwhile https://github.com/Arksine/moonraker/commit/a7b9e5783de921c7b071ceec726eb403915ab34c _update_repo_state() doesn't do that.

That is true, however an exception in GitRepo.initialize() would still propagate up to refresh().

Still, the corruption of the saved state is freaky, but I cannot reproduce it at this time. Has to be a race condition of some sort.

Are you sure the saved state was corrupted? Mainsail will at times report corrupted when the repo is invalid.

All that said, I did take a deeper look at this, and while it appears to me that its behaving as expected, the condition could be more clear to the user. In addition, the state should be saved on failure so its not reported as valid on a restart. Commit 119f579a44ab569524cbae12852362fc8ece1e68 addresses both of those issues.

kdomanski commented 4 months ago

Are you sure the saved state was corrupted? Mainsail will at times report corrupted when the repo is invalid.

"Corrupted" was the wrong word of me to use. The stored correctly reflected the (previous) invalid state of the repo.

But it could never be updated to the current (manually fixed with git switch master) state, because in refresh_repo_stage the call to _verify_repo() happens BEFORE the call to _find_current_branch().

Anyways, I just pulled your new commit and this is a MASSIVE improvement in the user experience. Thanks a lot! Screenshot 2024-02-24 032639

Arksine / moonraker