Open razvanazamfirei opened 1 year ago
I think this is likely due to https://github.com/Homebrew/brew/pull/15099 still being open.
Nevermind, these are using :page_match
, so doesn't seem related to the HEAD request issue.
Increasing number of casks are failing livecheck audits due to CDNs blocking the requests.
Does using a different user agent actually fix that though? Livecheck already uses two different user agents (Homebrew and fake Safari).
This is a planned feature that I already have implemented locally but it depends on the "options" feature and I need to create a PR for that first (that work is also done but there may be some discussion about the shape of it). I'm working my way towards those PRs but I'm currently working on the GitHub strategy PRs and I also need to wrap up that HeaderMatch
PR that Markus mentioned.
When I try to add the livecheck url for the cask hookmark
, it seems that the livecheck command is working good. But the audit is failure, since the livecheck URL is not reachable by error code 403.
After digging an hour, finally noticed that the reason why those two commands are working in differ is the existence of the user-agent header. https://github.com/Homebrew/brew/blob/master/Library/Homebrew/cask/audit.rb#L810
Is this feature development ongoing? or should I drop my changes?
Oh... two of the examples are hookmark
and this is the problematic cask! I visited again this package haha.
Is this feature development ongoing? or should I drop my changes?
@daeho-ro Oddly enough, I was working on this again yesterday. I had a working implementation when I last left it but I'm reworking the changes to the livecheck
block DSL before creating a PR. Depending on how I approach it, I may have to make some changes to Utils::Curl
beforehand but I'm working on getting user agent support done soon (while also laying the groundwork for other needed curl-related features).
Allow for
user_agent
to be specified forlivecheck
curl requests. Livecheck already uses two different user agents (Homebrew and fake Safari).
The more I think about this: the more I think we shouldn't be changing/faking our user-agents at all, let alone let them be customised per-formula.
Ultimately our user agents being blocked is a response by the upstream project to our requests. We should respect that and do something different rather than just workaround it.
The more I think about this: the more I think we shouldn't be changing/faking our user-agents at all, let alone let them be customised per-formula.
Ultimately our user agents being blocked is a response by the upstream project to our requests. We should respect that and do something different rather than just workaround it.
I agree that we shouldn't use a different user agent if a server specifically blocks requests because HOMEBREW_USER_AGENT_CURL
includes "Homebrew", as that's a clear signal about upstream's wishes. However, if we're simply blocked because our user agent isn't something common/typical (e.g., a browser) or shares some characteristics with bots (e.g., including "curl" text), that may not be as strong of a signal in terms of upstream's intentions.
This only affects casks at the moment, likely because commercial/closed-source software websites can be more aggressive about blocking atypical requests. We've had some livecheck
blocks that work locally but fail on CI because the combination of the server's IP address and a non-browser user agent is enough to trigger server-side anti-bot measures (this can also happen locally if you use a VPN). In some of those cases, using a :browser
user agent allows the check to work fine on CI.
livecheck has had logic to fall back to the :browser
user agent if :default
fails since mid-2021, so specifying the user agent in a livecheck
block is simply about efficiency (avoiding unnecessary requests that we know will fail) and clarity. [My plan has always been to remove the automatic user agent retry logic in the Livecheck::Strategy
methods once we can specify a user agent in livecheck
blocks, as the former was just a stop-gap measure until the latter is available.]
In my current implementation, user agent support is simply part of being able to specify #curl_args
arguments in a livecheck
block, as that would allow us to handle other weird situations (though if that's too much, I could take a more limited approach). If we're against swapping user agents in general, we would have to remove existing user agent-switching logic throughout brew, remove packages as soon as they're unreachable with our user agent (there are currently 40 casks like this), and disallow new packages based on this criterion. As mentioned, some requests work locally but fail on CI, so it seems like a shame to drop working packages for that reason.
365d install counts for existing casks specifying a user_agent
for the cask url
are as follows:
adobe-acrobat-pro: 2479 adrive: 2943 attachecase: 89 chalk: 109 clickcharts: 37 coinomi-wallet: 53 dehelper: 57 disk-inventory-x: 3230 elgato-wave-link: 383 eudic: 1143 font-andagii: 18 font-constructium: 16 font-fairfax: 14 frhelper: 33 hookmark: 280 hopper-disassembler: 36 ithoughtsx: 85 jt-bridge: 8 jump: 228 keycue: 214 latexit: 1837 ludwig: 28 neteasemusic: 2649 outguess: 395 pages-data-merge: 55 paraview: 1854 photozoom-pro: 10 pikopixel: 52 popchar: 54 subsurface: 90 tableau: 1169 tableau-prep: 142 tableau-public: 265 tableau-reader: 52 todesk: 2029 typinator: 172 unite-phone: 5 yandex: 1483 youdaonote: 281
However, if we're simply blocked because our user agent isn't something common/typical (e.g., a browser) or shares some characteristics with bots (e.g., including "curl" text), that may not be as strong of a signal in terms of upstream's intentions.
Agreed 👍🏻
remove packages as soon as they're unreachable with our user agent (there are currently 40 casks like this), and disallow new packages based on this criterion
I'd agree with this 👍🏻
As mentioned, some requests work locally but fail on CI, so it seems like a shame to drop working packages for that reason.
I could see us allowing these to remain if so.
Verification
brew install wget
. If they do, open an issue at https://github.com/Homebrew/homebrew-core/issues/new/choose instead.Provide a detailed description of the proposed feature
Allow for
user_agent
to be specified forlivecheck
curl requests. In essence, it would reuse the existing implementation from theurl
stanza and apply it to thelivecheck
url if needed.What is the motivation for the feature?
Increasing number of casks are failing livecheck audits due to CDNs blocking the requests. This would add the flexibility to use
user_agent
if needed. See https://github.com/Homebrew/homebrew-cask/pull/146350 https://github.com/Homebrew/homebrew-cask/pull/146223 https://github.com/Homebrew/homebrew-cask/pull/146330 for recent failures.How will the feature be relevant to at least 90% of Homebrew users?
Maintainers will spend less time troubleshooting livechecks which will increase the time they spend on other user-facing projects.
What alternatives to the feature have been considered?
Not implementing it and just skipping CI when the issue comes up.