Open paul-hansen opened 3 weeks ago
Thanks for reporting!
cc @alsuren Did we hit the rate limit of stats collection?
Yeah, vercel said we hit our http handler time limit. We could probably talk to vercel and get that bumped.
Thank you!
That's good to hear!
Seems to be working again, closing. Thanks!
Feel free to reopen if you want to use this issue as a reminder to ask them to bump the limit if you haven't yet or anything.
I just started getting the 402 Payment Required errors again.
Also ran into this problem. A clipping of logs that you may find useful:
...
+ ./cargo-binstall -y --force cargo-binstall
INFO resolve: Resolving package: 'cargo-binstall'
WARN Failed to send quickinstall report for package cargo-binstall-1.10.3-x86_64-apple-darwin: Failed to download from remote: could not HEAD https://warehouse-clerk-tmp.vercel.app/api/crate/cargo-binstall-1.10.3-x86_64-apple-darwin.tar.gz: HTTP status client error (402 Payment Required) for url (https://warehouse-clerk-tmp.vercel.app/api/crate/cargo-binstall-1.10.3-x86_64-apple-darwin.tar.gz)
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-binstall" }, tag: "v1.10.3" } artifact_name="cargo-binstall-x86_64-apple-darwin.zip"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-binstall" }, tag: "v1.10.3" } artifact_name="cargo-binstall-x86_64-apple-darwin.zip"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
WARN resolve: Timeout reached while checking fetcher invalid url: deadline has elapsed
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-binstall" }, tag: "v1.10.3" } artifact_name="cargo-binstall-universal-apple-darwin.tbz2"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
WARN resolve: Timeout reached while checking fetcher invalid url: deadline has elapsed
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-binstall" }, tag: "v1.10.3" } artifact_name="cargo-binstall-universal2-apple-darwin.tbz2"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-binstall" }, tag: "v1.10.3" } artifact_name="cargo-binstall-universal2-apple-darwin.tbz2"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-binstall/releases/tags/v1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
WARN resolve: Timeout reached while checking fetcher invalid url: deadline has elapsed
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-bins", repo: "cargo-quickinstall" }, tag: "cargo-binstall-1.10.3" } artifact_name="cargo-binstall-1.10.3-x86_64-apple-darwin.tar.gz"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-bins/cargo-quickinstall/releases/tags/cargo-binstall-1.10.3", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=[https://api.github.com/repos/cargo-bins/cargo-quickinstall/releases/tags/cargo-binstall-1.10.3}:](https://api.github.com/repos/cargo-bins/cargo-quickinstall/releases/tags/cargo-binstall-1.10.3%7D:) Received status code 403 Forbidden, will wait for 120s and retry
WARN resolve: Timeout reached while checking fetcher QuickInstall: deadline has elapsed
WARN The package cargo-binstall v1.10.3 will be installed from source (with cargo)
...
(c) https://github.com/kamu-data/kamu-cli/actions/runs/10579077977/job/29310649600?pr=795#step:6:32
Thanks, this is simply because we are hitting the rate limit of vercel
It's probably because cargo-binstall now reports for available every target on the machine.
Thanks, this is simply because we are hitting the rate limit of vercel
It's probably because cargo-binstall now reports for available every target on the machine.
It seems like it's causing it to build from source instead of downloading a binary, is this expected? I haven't looked at the code yet, just you calling it "reporting" makes me wonder if it's something we could skip if it fails but still download a binary.
It seems like it's causing it to build from source instead of downloading a binary, is this expected?
No it isn't, the message is a bit confusing, but a failing telemetry does not have any influence over resolution.
In this case it's due to a timeout, because binstall reaches the rate limit.
Providing a github-token would fix it.
Ah, so I'm guessing I was seeing long compile times were just because there was a new version of cargo leptos so there wasn't a binary generated yet. I had assumed the warning was related like it couldn't let quickinstall know of the new version to build or something.
I'll add a note to the issue description to let users know it's just telemetry and any longer build times are unrelated.
what is worse than this issue, is that even getting the binary from source does not work either
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-lambda", repo: "cargo-lambda" }, tag: "v1.3.0" } artifact_name="cargo-lambda-v1.3.0.aarch64-apple-darwin.tar.gz"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-lambda/cargo-lambda/releases/tags/v1.3.0", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=https://api.github.com/repos/cargo-lambda/cargo-lambda/releases/tags/v1.3.0}: Received status code 403 Forbidden, will wait for 120s and retry
# waiting 120s....
INFO has_release_artifact{release=GhRelease { repo: GhRepo { owner: "cargo-lambda", repo: "cargo-lambda" }, tag: "v1.3.0" } artifact_name="cargo-lambda-v1.3.0.aarch64-apple-darwin.tar.gz"}:do_send_request{request=Request { method: GET, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/cargo-lambda/cargo-lambda/releases/tags/v1.3.0", query: None, fragment: None }, headers: {"accept": "application/vnd.github+json", "x-github-api-version": "2022-11-28"} } url=https://api.github.com/repos/cargo-lambda/cargo-lambda/releases/tags/v1.3.0}: Received status code 403 Forbidden, will wait for 120s and retry
# waiting a bit more...
WARN resolve: Timeout reached while checking fetcher invalid url: deadline has elapsed
In this case you could either
GITHUB_TOKEN
env var, or login in gh
or store credentials via git
--maximum-resolution-timeout
or env BINSTALL_MAXIMUM_RESOLUTION_TIMEOUT
to reduce the timeoutOur default timeout is probably a bit too large, we should set it to something smaller, i.e. 30s
@NobodyXu thank you, it worked pretty well after setting a GITHUB_TOKEN
env var 🙇
Possibly a dumb question, as I'm not super familiar with exactly the kind of metrics that's collected via the vercel app. But since all downloads happen via GitHub releases, download counts can at least be fetched via GitHub's API for each release. Could be done as a scheduled GitHub action that just iterates over all releases and gathers the artifact download counts. admittedly with the number of releases you have, it could be a quite slow job to avoid hitting API rate limits.
¯\_(ツ)_/¯... Just throwing it out there incase it might be useful and/or spark some ideas.
Hmmm that's an interesting idea.
What quickinstall needs though, is the software to be build that we aren't providing yet, and then build and provide pre-built for it.
We have a script for fetching popular crates from libs.rs, but it's not used in ci https://github.com/cargo-bins/cargo-quickinstall/blob/main/get-popular-crates.sh
I suppose we could use that instead
Turns out that we do fetch popular crates from https://libs.rs
Sorry for the delay in looking at this.
Looks like they paused my account a couple of weeks ago for repeatedly going over the free tier allowance. I assume this is because of a change in how spammy cargo-binstall is with its stats?
I have sent $20 in vercel's direction to unblock things. The price appears to be per-user-per-month rather than usage-related. This means that if I want to let anyone else help with the ops side of a stats server on vercel, it would be another $20/month per user? This does not fill me with joy.
I will send a link to this thread to their support people and see whether they have an open source tier or something.
In the next month, I think we need to do at least one of the following:
1) get our stats volumes down to reasonable levels again
i) could we delay reporting until we have a cache miss (and do it while we're compiling the crate)?
2) find some other hosting that allows multiple admins for debugging, or get vercel to fund us for that.
i) if we're sticking with vercel, we should probably change the URL to something more on-brand?
ii) I heard a rumour that cloudflare/fastly/... have open source hosting tiers, but I can't find the info right now.
3) make the stats reporting protocol and gathering code more maintainable
i) I've been wanting to rewrite all of the cronjob code using #!/usr/bin/env -S cargo +nightly -Zscript
or something for a while now)
ii) We should really be using ?query=params or something rather than the tarball name for the http request, so we can encode more information (including sending multiple architectures in the same http request?)
iii) We probably shouldn't be using redis for stats storage because encoding multiple facets in it is a right pain. The influxdb setup that I have been playing about with is pretty good for this
( for maintainers: https://eu-central-1-1.aws.cloud2.influxdata.com/orgs/69235d4f38c3e042/dashboards/0d9b2d5b2b13c000?lower=now%28%29+-+30d - reply here with your email address me if you want access)
i) could we delay reporting until we have a cache miss (and do it while we're compiling the crate)?
That's definitely doable in cargo-binstall
For the stats reporting part, I'm also working on using crates-io daily db snapshot.
I'm currently trying to put up a python script, using polars to do it.
Since it's all csv and we only care about top n popular binary crates, it should be definitely doable.
Using data from https://static.crates.io/db-dump.tar.gz
I was able to write a python script for getting top 2000 popular binary crates:
# execute this in 20xx-xx-xx-xxxxxx/data/
import polars as pl
(
pl.scan_csv("crate_downloads.csv")
.join(pl.scan_csv("crates.csv").select("id", "name"), left_on="crate_id", right_on="id")
.join(pl.scan_csv("default_versions.csv"), on="crate_id")
.join(
pl.scan_csv("versions.csv").select("id", "crate_id", "yanked", "bin_names"),
left_on=("crate_id", "version_id"),
right_on=("crate_id", "id"),
)
.filter(pl.col("bin_names") != "{}", pl.col("yanked") == "f")
.sort(by="downloads", descending=True)
.select("name")
.head(2000)
.collect(streaming=True)
)
It can be combined with a git pull --tags
and then check if it's already built, crate exclusion list for specific target and the randomised pick part.
I think it'd be a pretty good replacement for existing telemetry?
cc @alsuren how does that look to you?
It's true that crates-io does not collect target info, so perhaps we should just build it for every target we support.
Looks good. I will try rewriting a bunch of this bash nonsense in python rather than rust. The crates.io dump is a bit huge, so I will make a weekly cronjob to dump the Sunday somewhere.
How do we feel about using uv for managing our python environment? I have had some success using it in another you project of mine: https://github.com/alsuren/sixdofone/pull/8/files
The crates.io dump is a bit huge, so I will make a weekly cronjob to dump the Sunday somewhere.
I think daily cronjob makes more sense?
The crates.io dump is updated every day, with it we could avoid hitting crates.io API so often.
How do we feel about using uv for managing our python environment? I have had some success using it in another you project of mine: https://github.com/alsuren/sixdofone/pull/8/files
Using uv makes sense for me, though I'd like dependabot to be enabled it, and having some CI to ensure it works.
Edit: This warning is just regarding telemetry and any longer build times are unrelated. For me it was building from source due to a new version of the crate just being released.
Getting this warning just now in our CI
It then builds it from source. It worked earlier today without this error.