XaverianTeamRobotics / CenterstageFTCrobotcontroller

make robot move
https://robotics.xbhs.net/
BSD 3-Clause Clear License
8 stars 4 forks source link

The repository is way too big #524

Open MatthewL246 opened 1 year ago

MatthewL246 commented 1 year ago

Git is very inefficient at storing binary files, and the 70MB TeamCode apk being uploaded for every commit is causing problems. We should be hosting it somewhere else outside the repository, like in a release maybe. The repository is over 1 GB, which is over GitHub's size recommendations (and if it ever gets over 5gb, expect a complaint from GitHub support).

This is slowing down cloning by a lot and git operations in general. I disabled the apk commit function because the size issue just keeps getting worse. We could use something like git-filter-repo to nuke the apk from the entire history and permanently fix the size issue, but this would require everyone to delete their clone of the repo and reclone due to editing history.

MatthewL246 commented 1 year ago

See https://github.com/MatthewL246/FtcRobotController-smol for the output of git-filter-repo removing the apk. The .git directory is only 48 MB without all the junk.

tom-ricci commented 1 year ago

ugh i knew this would happen sorry about that

you can create a new repo and link it to this one via a submodule, and then in the github action push to the submodule repo. that'll save you a few years

i also highly recommend removing the apk from the history because qualcomm had an issue with ftc repo size in the past (that's why the repo we forked from only goes back to sdk version 7) and they'll probably hit the 5gb limit again

(only remove the apk though, don't remove the whole history. im kinda banking on this repo having 2000 commits because im telling faang recruiters about having led this project to get internships lol)

MatthewL246 commented 1 year ago

I don't think there's a point in saving the apk history since we have the source code, so I was thinking I could make the workflow update a pre-release tag that always has the latest apk version (#526).

MatthewL246 commented 1 year ago

Update:

This means our repo was 93% apk. Also, it doesn't make the GitHub API repo size smaller, but clones are so much faster.

Commands used (must be run on Linux, Codespaces works perfectly):

git clone https://github.com/MatthewL246/FtcRobotController-smol.git
cd FtcRobotController-smol
du -hs .git
curl -L https://github.com/newren/git-filter-repo/releases/download/v2.38.0/git-filter-repo-2.38.0.tar.xz -O
tar -xf git-filter-repo-2.38.0.tar.xz
rm git-filter-repo-2.38.0.tar.xz 
mv git-filter-repo-2.38.0/ ..
../git-filter-repo-2.38.0/git-filter-repo --invert-paths --path HelpPage/apk/bin/TeamCode-debug.apk --force
du -hs .git
git config push.autosetupremote true
git remote add origin https://github.com/MatthewL246/FtcRobotController-smol.git
git push --force --all
git push --force --tags
tom-ricci commented 1 year ago

I don't think there's a point in saving the apk history since we have the source code, so I was thinking I could make the workflow update a pre-release tag that always has the latest apk version (#526).

Workflow artifacts are likely better for this scenario. Am not completely sure how their size is limited, but it should be possible to delete or mutate artifacts to prevent running into size limit issues. I would recommend against releases because they're useful for tracking library releases, and I know there's at least one library in this repository that has published releases. Also, they're useful for keeping track of doc versions. Each version of the repo warrants new docs, and then you can archive old docs for old versions like I did for 1.0 and 1.1

MatthewL246 commented 1 year ago

The problem with artifacts is that you need to be logged-in to GitHub to download them. I created a script in https://github.com/XaverianTeamRobotics/FtcRobotController/issues/526#issuecomment-1732150026 that could be embedded in the docs website, but it would require someone to generate a personal access token with actions:read permissions for this repo and post it publicly. I do not know exactly what the risks involved in this are - I don't think actions:read allows anyone to do anything malicious to the repo, but I would be worried about someone using it to spam the GitHub API and potentially get the account owner in trouble.

Maybe this should be a job for @lasagnadmin? (I'm not 100% sure how that account works because I wasn't there when it was created, is it a bot or a regular user? And who can generate a personal access token for it? @michaell4438)

Options:

By the way, Actions artifact size limits are not a problem because they are unlimited for public repos, but they are deleted after 90 days. See the docs.

tom-ricci commented 1 year ago
  1. Lasagnadmin is the right account for this, and it's a normal user account. It's the team's connection to Cloudflare for hosting the public site and serverless. Anyone who's worked in the IT office knows the password to both it and our Cloudflare account. If you want, Michael can connect the team CF to your CF (or I can if he doesn't know how). It'll be especially useful for you I think since you do a lot of the team's DevOps.
  2. You don't want to expose a read token. This is very bad.
  3. When I made Lasagnadmin, I made sure to claim xbhs.pages.dev and xbhs.workers.dev so we can use CF pages, functions, and workers. You could make a worker like archive.xbhs.workers.dev, save a read token in Workers KV or R2, and use that to fetch the archive and send it to the user.
  4. Also, are you sure release binaries are mutable? Because release tags are not.
MatthewL246 commented 1 year ago

Thanks, that's all really good to know. Hosting an apk downloader on CF could definitely be a better, safer option - I didn’t realize we were already using CF.

First, though, I'd like to try creating an action for the CI release. I know that release assets are not mutable, but I think a release can be deleted and recreated with the API, and tags can be force-pushed. I think someone else has probably done it already tbh.

I'll test on my fork ofc to prevent master commit spam.

MatthewL246 commented 1 year ago

Alright, I implemented the APK release in #528.

MatthewL246 commented 1 year ago

Update: it sounds like this is going to be shelved until the end of the season in May.

If I'm not available, these are the commands that need to be run in a Codespace in this repository:

git clone https://github.com/XaverianTeamRobotics/FtcRobotController.git
cd FtcRobotController
du -hs .git
curl -L https://github.com/newren/git-filter-repo/releases/download/v2.38.0/git-filter-repo-2.38.0.tar.xz -O
tar -xf git-filter-repo-2.38.0.tar.xz
rm git-filter-repo-2.38.0.tar.xz 
mv git-filter-repo-2.38.0/ ..
../git-filter-repo-2.38.0/git-filter-repo --invert-paths --path HelpPage --force
du -hs .git
git config push.autosetupremote true
git remote add origin https://github.com/XaverianTeamRobotics/FtcRobotController.git
git push --force --all
git push --force --tags

Also, the branch protection rules will need to be temporarily disabled to do this.