Magnitus- / gogcli

Client to Interact With the API of GOG.com
MIT License
39 stars 0 forks source link

Generation of manifest is falling back to the slow method. The manifest.json file is not updated. #29

Closed johnnysmith65 closed 3 weeks ago

johnnysmith65 commented 2 months ago

Generation of manifest is falling back to slow method for every game. Updated manifest.json is not created. The files manifest-generation-progress.json and manifest-generation-warnings.json are created. Many warnings are output during the attempt to generate the updated manifest. The command used was: gogcli.exe manifest generate --lang=english --os=windows manifest-generation-warnings.json

Great tool by the way. It was working great just a few days ago.

Magnitus- commented 2 months ago

Be careful about posting your manifest file online. It may contain some cd-keys. I should probably manage that in a separate file at some point.

Gog started returning bad metadata on a lot of their files recently. Not sure why. There is a longer workaround method when that happens that takes more time.

I had a problem with the manifest generation stopping in error, because fake download links (that appear as labels in your personal library) started returning a 500 code instead of the expected 403 or 404 code for one of the games. I found another way to identify those files as they have an indicated size of 0 MB, so they are now trimmed before the files' metadata is fetched.

But you seem to be experiencing some other issue. I'll try generating a manifest and see.

Btw, if you manifest generation got interrupted because of a transient error, you can run the gogcli.exe manifest generate-resume command and it will resume where it left off.

johnnysmith65 commented 2 months ago

Thanks. Also it took around 3 hours for the manifest-generation-progress.json to be generated and it also was using an extreme amount of network bandwidth while doing so (200+ Mbps). I didn't notice this and it used about 200 GB of data use while doing so. With that bandwidth usage, I have to wait till there is a fix to gogcli or I find out what the problem is on my end.

Magnitus- commented 2 months ago

I can tell you what that particular issue is: When doing the roundabout way if it can't fetch the checksum from gog, the game will be downloaded twice.

It will be downloaded once during the manifest generation to compute the checksum and it will be downloaded again when applying actions to actually download the game.

This downloading the game files twice eliminates various categories of transit error (which are unlikely to cause the same error in the file data twice... if they do, then gog has a bad file on their servers) and offer greater guarantees as to the downloaded file's integrity.

Unfortunately, it consumes a lot of time and bandwidth. Hence the "File metadata was still fetched using much longer workaround method." message. Note that this isn't needed when gog supplies valid metadata to validate download checksums, but these last few days, they've had issues with that it seems.

johnnysmith65 commented 2 months ago

Thanks for the information. With that amount of data usage it should be optional to use workaround method via the command line. I did a little debugging and noticed that it seems to be falling on the second redirection in getDownloadFileInfo in download.go. I program but not in Go and don't have any experience in coding interaction with websites so that is about all I figure out.

Magnitus- commented 2 months ago

It is tempting to say it would be an easy modification to allow installer checksums to be optional, but I'd have to analyze what the implications of generating a manifest with missing checksums would be for the installers. Based on an analysis of the gog api at the time, it was an underlying assumption that checksums would always be present for installers and could be relied on to determine both installer integrity and changes.

For the errors you got earlier, I'm generating a manifest now to see what's what, but I have a lot of games and with the current state of the gog api (with no valid metadata info), it is taking some time with the workaround.

fyi though, generating the entire manifest tends to be uncommon for me (even if it is usually a lot more hassle free than right now). I'll do that maybe once or twice a year just to make sure.

But more often, I just grab the latest updates that gog flags like so:

gogcli storage download manifest -p s3.json -k s3
gogcli update generate
gogcli manifest update --update=updates.json
gogcli storage apply manifest -s -p s3.json -k s3
gogcli storage execute-actions -r 1 -t size -p s3.json -k s3 -n

So even when there is a slowdown like what is happening right now, it tends not to be too bad, because I'm only updating the manifest for a couple of games that gog indicated where either new or updated.

Edit: Thinking about it though, I find it strange that really all files seem to have xml metadata problems now (I don't believe I see any that are ok). I will investigate this in case they changed something in an incompatible way for gogcli.

Magnitus- commented 2 months ago

Ok, they did a change on their download apis. They removed the second redirection from their download protocol.

I did the adjustments on the main branch. The next release should fix the problem and save both a lot of time and bandwidth.

Magnitus- commented 2 months ago

Note that bandwidth-wise, it will download the extras to compute the checksum though, but they are much small than the installers.

As I recall, it didn't use to that (gog doesn't provide checksums on extras, but because the extras are zip files, it is possible to do an integrity check at download time and compute the checksum then as well).

Also, the message I'm seeing that extras have bad metadata files is misleading. They don't have metadata files. Clearly a regression that I'll have to fix in a future release.

johnnysmith65 commented 2 months ago

Thanks. The update fixed the issue. For my particular application, I don't care about the extras and don't want to use my internet data just to get the checksum and files sizes for them. I ended up adding the following hack to the source code. The body length of 1 was needed since if was set to zero then the file would not be in the final manifest.json (although it would be in the temporary generation one). The url redirect was needed to get the actual filename. I didn't see a command line option that would duplicate my intended program behavior. It would be nice to have a supported option to not to try to download (non-existent) metadata files for the extras (and then fallback to downloading the extras to get metadata).

In the file sdk-get-core.go change the function getUrlBodyChecksum to the following:

// NEVER DOWNLOAD THE FILE TO GET THE FILESIZE AND CHECKSUM func (s Sdk) getUrlBodyChecksum(url string, fnCall string, retriesLeft int64) (BodyChecksumReply, error) { reply, err := s.getUrlRedirect(url, "", (s).maxRetries) return BodyChecksumReply{ BodyChecksum: "", BodyLength: int64(1), FinalUrl: reply.RedirectUrl, StatusCode: reply.StatusCode, RetriesLeft: retriesLeft, }, err } // END OF CHANGES

Magnitus- commented 2 months ago

Thanks for the feedback. My implementation definitely prioritizes ensuring file correctness over bandwidth usage, but I mistakenly took the "unlimited" bandwidth for granted. At the very least, I should find a way to make the assumption clearer when the flag to use the workaround is enabled.

After I fix the extras, it is my hope that overall, few enough files will have bad metadata that this will be a non-issue (I've found that gog has improved the robustness of their file downloads quite a bit and it has been rather nice lately), but if that turns out not to be the case in the future, I'll analyze more seriously adding a flag to just skip checksums verification when metadata is bad.

xKevin04 commented 1 month ago

I'm currently creating a manifest too and the metadata situation is pretty bad, it feels like the fallback is being used for every game. I checked the progress json and found these warnings (shortened to a couple of entries):

  "Warnings": [
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_expansion/8143) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_expansion/8133) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_expansion/8153) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_2/8093) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_2/8083) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_2/8103) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/freespace_2/8073) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2233) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2253) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2283) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2243) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2303) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2263) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2273) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2313) -> Error location header url does not have the expected path query parameter",
    "Bad metadata workaround: getDownloadFileInfo(downloadPath=/downloads/fallout_2_classic/2293) -> Error location header url does not have the expected path query parameter"
  ]

Could this be the result of another change on GOG's side that needs to be handled? At the moment, 130 games have been processed and I have 162 warning lines like these, all with the same error. Lots and lots of games to go, I expect the warnings to grow accordingly.

I'm using gogcli v0.24.0.

johnnysmith65 commented 1 month ago

I get the same type of warning messages with the latest v0.24.0 version but the manifest still contains correct information. With the current version extras will be downloaded to obtain correct verified file sizes and checksums. I hacked the source code as mentioned above so that it won't do that. I can live with the extras only having estimated sizes and no checksums. This speeds up the process a lot for extras.

Magnitus- commented 3 weeks ago

Got to wrap some stuff up, but I'll make some time to look at it in the upcoming days.

Magnitus- commented 3 weeks ago

Ok, at first, I assumed it was a regression on my part, but as it turns out, the problem was due to an api change on GOG's end.

They changed the format for one of the urls so that the filename is now retrievable from the url path, rather than the query parameters.

I'll release a fix.

Magnitus- commented 3 weeks ago

fyi, the issue should be resolved in release v0.24.1

xKevin04 commented 3 weeks ago

Perfect, I can confirm the issue is resolved for me. Not a single case that requires the workaround and no warnings either when I generate a manifest with the new version. Extras don't get checksums anymore, but I'm pretty sure this has always been the case so that's alright.

Magnitus- commented 3 weeks ago

Perfect, I can confirm the issue is resolved for me. Not a single case that requires the workaround and no warnings either when I generate a manifest with the new version. Extras don't get checksums anymore, but I'm pretty sure this has always been the case so that's alright.

Yes, correct. Gog doesn't provide checksums for extras.

However, as pretty much all extras are zip files which have their own integrity checking mechanism builtin as a part of that format, gogcli is able to leverage that to check the integrity of the extra files when they are downloaded to the storage and also computes the md5 checksum at that time and populates the manifest with it (as I recall, it just forwards the downloaded bytes both to the storage and to the checksum computation so there is no significant overhead for that).

Magnitus- commented 3 weeks ago

I will close this issue. If there is anything else, another one can be opened.