Currently experiencing syncing issues to the cloud

bontebok commented 2 days ago

Describe the bug?

A headless we are running for a build session stopped syncing at 8:33 PM UTC (about an hour ago). Starting a vanilla client under a different user and attempting to sync also is failing to sync.

Attached are the logs from the vanilla client saving an empty gridspace world to a group.

To Reproduce

Save an empty gridspace world.

Expected behavior

The sync goes through in a short period of time.

Screenshots

Resonite Version Number

2024.11.19.479

What Platforms does this occur on?

Windows

What headset if any do you use?

Desktop and Headless

Log Files

DESKTOP-M4CKTHS - 2024.11.19.479 - 2024-11-23 16_15_31.log

Additional Context

No response

Reporters

@Rucio

Frooxius commented 2 days ago

Thanks for reporting this!

I've pulled logs and restarted the sync queue worker.

It seems to be processing now, so that should resolve the immediate issue, but I'll have to go through the logs and figure out why it stopped in the first place, so it doesn't happen again.

stiefeljackal commented 2 days ago

I can confirm that items are syncing now.

bontebok commented 2 days ago

The headless and my client synced. Thank you @Frooxius - FYI, CJ Build Battle tonight so there will be a lot of world syncing.

Frooxius commented 2 days ago

If it gets stuck again, poke me and I'll kick it again.

bontebok commented 14 hours ago

If it gets stuck again, poke me and I'll kick it again.

No sync issues during the jam, thanks for the quick response!

Frooxius commented 14 hours ago

Thanks for the update!

I want to keep this open though, I still want to dig into the underlying cause of this.

bontebok commented 13 hours ago

Thanks for the update!

I want to keep this open though, I still want to dig into the underlying cause of this.

Sure thing!

Related, if you ever want to talk about the preprocessing routines, I'm curious to learn what's going into the latency of the cloud's response. I know it's performing asset lookups to determine if an asset exists or not to inform the client, but unless it's also performing an R2 check, rehashing etc., this should be a pretty quick operation at the database level. If you're ever diving into this and want another pair of eyes or a rubber duck, feel free to reach out.

Frooxius commented 13 hours ago

Thank you, I appreciate the offer! I'm not sure when I'll be digging into it some more, though it's something I'd like to make more efficient where possible.

To give more context, the preprocessing routines are a fair bit more complex than just checking the existence of assets.

The major part of the preprocessing is pre-pinning all the assets on both the per-account and global reference counting lists to make sure the user is allowed to upload everything needed for syncing particular record - that way the syncing won't suddenly fail in middle of uploading the assets.

It essentially "stages" all the changes the sync wants to make, so the actual upload can then happen "confidently" and once it's done it confirms all the staged changes that were made in preprocessing.

I don't think it's related to this issue though, since I haven't really changed anything with this part recently and the whole process is wrapped in retry logic, so even if it fails syncing one record, it should still keep processing the queue and re-try the failed one later. The fact that it just stopped completely and died is a bit odd.

Frooxius commented 8 hours ago

I've done some investigation and fix-ups based on what I could find, as well as adding some additional diagnostics, alerts and error wrapping.

I'll monitor the issue and see if it re-occurs. I'm not 100 % sure if the issues I found in the log were the actual culprit here.

Yellow-Dog-Man / Resonite-Issues