ZeroK-RTS / Zero-K-Infrastructure

Website, lobby launcher and server, steam deployment, .NET based tools and other vital parts of Zero-K infrastructure
GNU General Public License v3.0
53 stars 52 forks source link

Improve ChobbyLauncher crash reporting #2978

Closed Mankarse closed 1 month ago

Mankarse commented 6 months ago

The primary purpose of these improvements is to collect more desync data.

See this example issue: https://github.com/Mankarse/Zero-K-Debug/issues/34

I have split the changes into 5 commits, ordered from least to most controversial.

The first three commits are uncontroversial improvements. The remaining two commits might be controversial, because they use GitHub in an unusual way.

1) Updated Octokit.net to the latest version This contains a necessary bugfix. You should apply this, even if you do nothing else with my Pull Request.

2) Improve the algorithm in CrashReportHelper.Truncate The new algorithm is required for the later commits to work. It is still valuable even if they later commits are not used, because it is much faster. The only downside is that its implementation is much larger.

3) Improve parsing of infolog_full.txt Splits infolog by game, and extracts GameID, Desync locations, GameState file names. Includes GameState file names in the Issue description (if they are relevant).

4) Use Issue Labels to link desync issues arising from the same game

Adds "GameStateFor-{gameID}" and "HasDesyncGameState" labels whenever a crash report includes GameState for a desync.

Add "HasLinkedDesyncIssue" label when there is an Issue with a matching "GameStateFor-{gameID}" label.

Adds links to existing Issues to the description of the new Issue.

This will create an unbounded number of Labels (one for each gameID).

Please suggest better names for these Labels, if you have any ideas.

5) Upload files to GitHub This works by creating a "Release" for every 250 issues, and then uploading the files to that release.

The main risk of changes 4 and 5 is that they increase the chance that the ZeroK-RTS/CrashReports will hit usage limits.

I am not aware of any documented limits that will be exceeded, but it is possible that GitHub will not like the unusual way that CrashReports will be being used.

Licho1 commented 6 months ago

Have you tested this on both recent and older infologs? Regarding release upload, it is a bit fishy. Let's think about alternatives. We could store it on our server machine or upload it to blobs which are used by server to store replays.

In github it could also be attached to wiki I think.

Hmm @GoogleFrog do you have some feedback on this?

Otherwise it looks fine to me.

We could also keep it in releases but it means we have to clean that periodically.

Mankarse commented 6 months ago

Thanks for looking at this, and thanks for the feedback.

Testing on older infolog.txt

In reality, the parsing will be run on infolog_full.txt, not infolog.txt. Older infolog_full.txt are not retained, so I performed most of my testing on infolog.txt instead of infolog_full.txt. The earliest infolog.txt that I have is from early 2023. My testing did not reveal any problems related to the age or engine version of the infolog.txt, but I did encounter the occasional problem where the parser would produce slightly incorrect results. These appeared to be fixable, but there is a need to balance correctness in parsing against being excessively dependent on aspects of the infolog format that could change.

I will perform more detailed testing and analysis over the weekend.

File uploading

Over the past year, there were ~38000 issues created in CrashReports. Of these, ~2600 were desync issues.

Going off the infolog/GameState files that I have, the expected (compressed) file sizes are: Situation Size (MB)
Worst case (very large infolog, multiple GameState) 3
Average case (average sized infolog and single GameState) 1
Average case (infolog only, no GameState) 0.2
Multiplying this out, we get yearly growth estimate: Situation Size (GB) Details
All worst case (unrealistic) 114 3MB * 38000
All average case 9.7 1MB 2600 + 0.2MB(38000-2600)
Only upload for desync (Average case) 2.6 1MB * 2600

This immediately rules out using a GitHub wiki (since GitHub wikis are Git repositories). From https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#repository-size-limits

We recommend repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended. ... If your repository excessively impacts our infrastructure, you might receive an email from GitHub Support asking you to take corrective action.

For GitHub Releases, there is no problem. From https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#distributing-large-binaries

We don't limit the total size of the binary files in the release or the bandwidth used to deliver them. However, each individual file must be smaller than 2 GiB.

As for uploading it to the server or to Azure Blobs: the amount of data doesn't prohibit this, as long as the files aren't retained for more than a year or two. If you have a preferred method, I'm willing to implement it (but it does look like a lot more work for me, compared to the GitHub releases version that I have already done).

In the current version, an untruncated infolog_full.txt is uploaded for every issue. The size estimate shows that the total data use could be substantially reduced if files were only uploaded for desync issues. What are your thoughts on this?

GoogleFrog commented 6 months ago

@Licho1 I like the sound of wiki over releases. Otherwise I don't have much feedback as I don't know the technical details of the current system.

Mankarse commented 1 month ago

Closing for now. If there is interest in the future I'll open a new pull request.