DanEngelbrecht / golongtail

Command line front end for longtail synchronization tool
MIT License
26 stars 8 forks source link

Incomplete upsync and difficult to understand log warnings #218

Closed AOOSG closed 2 years ago

AOOSG commented 2 years ago

I am uploading an installed UE5 build without PDBs which succeeds without any warnings, and right afterwards I upload the same folder with PDB files included. This time I get lots of warnings which looks like it's going to cause issues when I download the UE5 build with PDBs again.

I've seen this issue for multiple UE5 builds that's been uploaded, always a varying number of warnings for each build uploaded.

Here's some of the warnings when uploading (full log: uploadwarnings.txt):

time="2022-07-14T23:59:30+01:00" level=warning msg="Retrying putBlob" blockHash=14620957585736657187 key=chunks/cae8/0xcae8137585e2e923.lsb
time="2022-07-14T23:59:30+01:00" level=warning msg="Retrying putBlob" blockHash=15656149374840302791 key=chunks/d945/0xd945d0e815cf6cc7.lsb
... many more like this

And here's some of the many errors spammed when trying to download the build:

... many more like this
2022/07/16 02:54:18 {"file": "D:\a\longtail\longtail\src\longtail.c", "func": "WritePartialAssetFromBlocks", "line": "4611", "level": "ERROR", "context": "000001C3CB9EA140", "job_id": "335609794", "is_cancelled": "0", "msg": "BlockRead() failed with 2"}
2022/07/16 02:54:18 {"file": "D:\a\longtail\longtail\src\longtail.c", "func": "WritePartialAssetFromBlocks", "line": "4611", "level": "ERROR", "context": "000001C3CB9EBAC0", "job_id": "411107231", "is_cancelled": "0", "msg": "BlockRead() failed with 2"}
Updating version           100%: |██████████████████████████████████████████████████|: [15s]
2022/07/16 02:54:18 {"file": "D:\a\longtail\longtail\src\longtail.c", "func": "WriteAssets", "line": "5861", "level": "ERROR", "block_store_api": "000001C3CA7E6CF0", "version_storage_api": "000001C3A307D590", "job_api": "000001C3A3079990", "progress_api": "000001C3DB43FBA0", "optional_cancel_api": "0000000000000000", "optional_cancel_token": "0000000000000000", "store_index": "000001C3F005C040", "version_index": "000001C3D31C4040", "version_path": "D:\UE5", "chunk_hash_to_block_index": "000001C3F25F7040", "awl": "000001C3DAD01170", "retain_permssions": "1", "msg": "WriteAssetsFromBlockJob failed with 2"}
... many more like this

The log warnings and errors are a bit inscrutable from an end user point of view, I don't easily see the cause of the warnings. Did it succeed with the upload? Can I keep retrying the upload to fix this? I haven't checked yet, but does the program return a non-zero exit code if it fails, should I try to upsync again in that case?

Is it a GCP or cloud storage issue? Unreliable internet connection (a http status code would be useful)? Bug in the program? Any hints as to what's going on or how to debug it further would be greatly appreciated.

DanEngelbrecht commented 2 years ago

The warning are indicative that storing blocks are failing for some reason, I think it should issue an error if the retries fails (that is the intent anyway) but will have to double-check. Are there any lines with “error” in them in the complete log?

DanEngelbrecht commented 2 years ago

Likely you are getting rate limited and writing to fast, that error gets hidden (due to some other mechanics), but I´m wondering why I cant see any errors in the log from your upload? Did you really provide the full log? If the block uploads fail it should not write the version index so I´m unsure why you are able to start the download after that? Assuming there are some errors in the logs...

DanEngelbrecht commented 2 years ago

Could you try out https://github.com/DanEngelbrecht/golongtail/releases/tag/v0.3.6-pre1 and see if it gives you better log messages/errors?

AOOSG commented 2 years ago

I'm on vacation this week so didn't get a chance to look at it until now.

I built a new version of the installed build after upgrading to 5.0.3 which did not have any warnings during upload. When I tried to download that build version I had no issues.

I'll try v0.3.6-pre1 the next time I make a build and let you know what the log prints.

Full logs from the upload/download with warning and error spam: engine70upload.txt Downloading the engine version with PDBs included: engine70download.txt

DanEngelbrecht commented 2 years ago

There has been a bunch of fixes in 0.3.6. I will close this issue now, feel free to open it again if you get a repro.

AOOSG commented 2 years ago

It's been much better since I opened the issue originally, but it can happen sometimes, usually only a few messages at most. Here's an example from yesterday. I'm still on v0.3.6-pre1 so I'll update to latest:

time="2022-09-25T05:06:01+01:00" level=warning msg="failed to update metadata of stored block at `chunks/96af/0x96afd8b975ce4e49.lsb` in `gs://ue5-installed-build-longtail/store/` (gcsBlobObject.Write: Patch \"https://storage.googleapis.com/storage/v1/b/ue5-installed-build-longtail/o/store%2Fchunks%2F96af%2F0x96afd8b975ce4e49.lsb?alt=json&prettyPrint=false&projection=full\": read tcp 192.168.0.14:54943->172.217.169.48:443: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)" blockHash=10858135517670559305 fname=putStoredBlock key=chunks/96af/0x96afd8b975ce4e49.lsb s="gs://ue5-installed-build-longtail/store/"

And that's probably the reason for the client having a failure when trying to update to this version:

longtail-win32-x64.exe get --source-path gs://ue5-installed-build-longtail/engine-86-pdb-win64.json --target-path D:\UE5 --cache-path D:\LongtailCache --log-level warn --show-stats --mem-trace
2022/09/25 08:44:58 {"file": "D:\a\longtail\longtail\src\longtail.c", "func": "WritePartialAssetFromBlocks", "line": "4712", "level": "ERROR", "context": "00000170EF529CD0", "job_id": "746651354", "is_cancelled": "0", "msg": "job->m_VersionStorageAPI->OpenWriteFile() failed with 13"}
DanEngelbrecht commented 2 years ago

Looks like network connection issues to me. Or something on googles end as the connection times out, this is sadly more or less out of my control. Might be some timeout settings that can be tweaked, but that would be it I guess.