Support download strategies for livecheck urls

gibfahn commented 3 years ago

Provide a detailed description of the proposed feature

There are currently a wide variety of download strategies (that can be extended by users in taps) for downloading resources, but there isn't a way to do the same for livecheck URLs.

Download strategies: https://github.com/Homebrew/brew/blob/master/Library/Homebrew/download_strategy.rb

I'm imagining that one would be able to do:

  livecheck do
    url "https://example.com/foo/bar", using: CurlDownloadStrategy
    regex "..."
  end

in the same way you can currently set the strategy in the main block.

This might be somewhat confusing given that there is already a livecheck strategy, but the difference between a livecheck strategy and a download strategy has to be understood anyway by homebrew authors, so hopefully that woulldn't be too painful.

What is the motivation for the feature?

This would allow authenticating to sites that required auth to fetch the latest version.

How will the feature be relevant to at least 90% of Homebrew users?

It would only be relevant for those using custom taps where auth is required, say for example you needed to parse a JSON file to get the latest version:

  livecheck do
    url 'https://myorg.com/tools/mytool/releases.json'
    strategy :page_match do |page|
      JSON.parse(page)['tool']['releases']
          .select { |release| release['latest'] == true }
          .map { |release| release['version'] }
    end
  end

What alternatives to the feature have been considered?

I'm not aware of an alternative, other than just not using the livecheck feature at all, which is a pity as it works very well.

gibfahn commented 3 years ago

If the answer is "please raise a PR" then that's fine by me.

carlocab commented 3 years ago

I think @samford is working on extensions to livecheck that might enable (or, at least, simplify the implementation of) what you're looking for here.

samford commented 3 years ago

If the goal is to allow for requests that involve authentication, this should be possible after 1) livecheck is migrated to curl internally (#10834 is in the final stages of review before merging) and 2) I create a PR that modifies the livecheck block DSL to allow for configuration options (passing them into strategies, Strategy#page_content/Strategy#page_headers, #curl_args #curl_with_workarounds, etc.). I already have the options setup implemented locally but I have some open PRs to finish before I create a PR for it.

With my current approach to options, you'll be able to pass options to curl, so you could use that to handle basic HTTP authentication (e.g., --user username:password), creating a POST request, etc. Would that be sufficient to support your intended use case?

That said, I'm not sure how you would approach this in a way that would keep secrets safe without the livecheck block being dependent on the local environment (e.g., referencing environment variables in the livecheck block). It's not an issue if this would just be for a personal tap but I figured it's worth mentioning, at least.

Just as an aside, I'm also planning to create a PR in the future to add Json and Xml strategies that handle parsing internally and pass the parsed object into the strategy block. This isn't really notable but it would save us from having to do the parsing in a strategy block (or create complicated regexes that match a specific field's value). Not really a "must have" but it was some low-hanging fruit that I've wanted to pick for a while. Same as above, it'll be a little bit before I create a PR for this work.

samford commented 3 years ago

It may also be worth mentioning that you can add a /livecheck/strategy directory to your tap to add your own strategies (or modify built-in ones): https://github.com/Homebrew/brew/blob/master/Library/Homebrew/livecheck/livecheck.rb#L79-L82

If you would benefit from having a strategy that makes sense within the context of your tap (e.g., you have a number of formulae with livecheck blocks that do the same thing) but it wouldn't be something we would include in livecheck itself (i.e., not applicable to first-party Homebrew taps), this is the way to do it.

For what it's worth, I left this feature undocumented (outside of a vague comment in the code) as there has been some pushback in the past about even allowing this in the first place (https://github.com/Homebrew/brew/pull/7937#discussion_r555559088). In my view, past experience has demonstrated its value, even if it's not something that we're going to invest much time/effort in supporting (i.e., we won't significantly modify livecheck to support something weird in a tap if it doesn't also apply to our first-party taps).

gibfahn commented 3 years ago

With my current approach to options, you'll be able to pass options to curl, so you could use that to handle basic HTTP authentication (e.g., --user username:password), creating a POST request, etc. Would that be sufficient to support your intended use case?

For the use-cases I've seen so far yes, being able to pass --header "Authorization: basic $(some command)" or --netrc to curl would be enough.

That said, I'm not sure how you would approach this in a way that would keep secrets safe without the livecheck block being dependent on the local environment (e.g., referencing environment variables in the livecheck block). It's not an issue if this would just be for a personal tap but I figured it's worth mentioning, at least.

Agreed, for more general use we'd probably want to integrate with the macOS keychain (previous discussion in https://github.com/Homebrew/brew/pull/11091#issuecomment-817772252), although the ~/.netrc is also a fairly standard thing.

I'm also planning to create a PR in the future to add Json and Xml strategies that handle parsing internally and pass the parsed object into the strategy block. This isn't really notable but it would save us from having to do the parsing in a strategy block (or create complicated regexes that match a specific field's value).

That would be awesome, the JSON parsing in my original comment in this issue is from a real formula I added. It isn't that much work to do (at least for JSON), but I was surprised to see so many brittle-looking regexes in the core formula that were parsing JSON when we have proper JSON libraries available.

gibfahn commented 3 years ago

It may also be worth mentioning that you can add a /livecheck/strategy directory to your tap to add your own strategies (or modify built-in ones): https://github.com/Homebrew/brew/blob/master/Library/Homebrew/livecheck/livecheck.rb#L79-L82

Oh wow, yeah this would make things a lot easier. Does it work for casks too? Last time I tried to do something similar (library in a tap) it required some complexity for casks because they copy the cask formula file into the caskroom so that you can brew uninstall it later, meaning that the relative import didn't work.

For what it's worth, I left this feature undocumented (outside of a vague comment in the code) as there has been some pushback in the past about even allowing this in the first place (#7937 (comment)). In my view, past experience has demonstrated its value, even if it's not something that we're going to invest much time/effort in supporting

Definitely agree with the value of having this per-tap not per-formula. Making it so that private taps can provide standard download methods without having to copy-paste things into every formula (or fork brew) is really helpful.

Is there any chance something similar is enabled for DownloadStrategies (for brew fetch rather than brew livecheck)? Having that as well would solve several problems for some taps I maintain. Again passing options to curl would be sufficient.

gibfahn commented 3 years ago

It may also be worth mentioning that you can add a /livecheck/strategy directory to your tap to add your own strategies (or modify built-in ones): https://github.com/Homebrew/brew/blob/master/Library/Homebrew/livecheck/livecheck.rb#L79-L82

Tried this out, and realised it doesn't work if you directly run brew livecheck ./Formula/myformula.rb, which is how I normally do the livechecking (from the root of the repo), vs doing it from an explicit tap.

samford commented 3 years ago

I was surprised to see so many brittle-looking regexes in the core formula that were parsing JSON when we have proper JSON libraries available

Being able to properly parse JSON requires a strategy block and these regexes are from before strategy blocks were implemented. They're not pretty but the important thing is that they provide a correct result (though they're naturally less reliable than properly parsing JSON).

That said, I plan to update related livecheck blocks (those matching JSON with a regex or parsing in a strategy block) after the Json strategy is available, so I don't have to go through them all twice.

Does it work for casks too?

I don't have much familiarity with casks, so all I can say is that if you put a cask file with a livecheck block in a /Casks directory in a tap, it should work with brew livecheck. I'm honestly not sure if having casks in a third-party tap is a use case that we explicitly support, though.

Is there any chance something similar is enabled for DownloadStrategies (for brew fetch rather than brew livecheck)? Having that as well would solve several problems for some taps I maintain. Again passing options to curl would be sufficient.

"Something similar" as in defining your own download strategies in a third-party tap? If that's what you mean, I don't think so.

The livecheck setup works because the strategy files are kept in a livecheck/strategy subdirectory (not the root directory of the tap) and I created additional logic to ensure those files are loaded when appropriate. If I remember correctly, Ruby files in the root directory of a tap are treated as formulae, so I'm not sure you could have a download_strategy.rb file that adds additional classes that inherit from AbstractDownloadStrategy.

It may be technically possible if we made changes to download_strategy.rb to mimic the livecheck setup (i.e., strategies in a subdirectory, logic to load strategies from a subdirectory in a tap, etc.). However, I wouldn't count on a PR for this being merged (it's not my decision). If this is something you're interested in, it would probably be best to create an issue to discuss it first, so you don't sink time into a PR only to have it be rejected.

Tried this out, and realised it doesn't work if you directly run brew livecheck ./Formula/myformula.rb, which is how I normally do the livechecking (from the root of the repo), vs doing it from an explicit tap.

The directory you're in must not be in Homebrew/Library/Taps. Formula#tap uses Tap#from_path to understand what tap a file is in. This method uses HOMEBREW_TAP_PATH_REGEX internally, which uses HOMEBREW_TAP_DIR_REGEX as a base, which only identifies a tap if it's in the Taps directory. Any tap strategies in the same tap where myformula lives won't be loaded in this scenario because brew livecheck can't understand what tap myformula is in (i.e., Formula#tap will return nil).

As such, if you want to use tap strategies, your tap will need to exist in Homebrew/Library/Taps in some form. If you don't want to keep the directory in this location, a symlink is sufficient. For example, I have a local samford/homebrew-test tap that I symlink into the Taps directory and that works fine. Once this is done, Tap#from_path should identify the tap correctly even if you run brew livecheck ./Formula/myformula.rb (not that I condone this 😉).

When your tap is tapped in Homebrew, you can also use the fully qualified name including the preceding tap name, like brew livecheck gibfahn/tap/myformula. If a myformula formula/cask isn't present in any other taps, you can simply run brew livecheck myformula. If you want to check all the formulae/casks in a tap, you can run brew livecheck --tap gibfahn/tap. If you want to selectively check only the formulae or casks in the tap, you can use brew livecheck --formulae --tap gibfahn/tap or brew livecheck --casks --tap gibfahn/tap.

gibfahn commented 3 years ago

That said, I plan to update related livecheck blocks (those matching JSON with a regex or parsing in a strategy block) after the Json strategy is available, so I don't have to go through them all twice.

Makes sense.

I'm honestly not sure if having casks in a third-party tap is a use case that we explicitly support, though.

It certainly works, and I've used it a bunch, so I hope so 😁

If this is something you're interested in, it would probably be best to create an issue to discuss it first, so you don't sink time into a PR only to have it be rejected.

Good point, although hopefully it's the same few lines as in the livecheck, so not too much implementation work.

Once this is done, Tap#from_path should identify the tap correctly even if you run brew livecheck ./Formula/myformula.rb (not that I condone this 😉).

The reason I do this is that I have a tap update script that runs brew livecheck --json Casks/* Formula/*, updates the url blocks, runs brew fetch, and then updates the sha256 blocks and raises a PR. I normally try to keep my development checkouts of taps separate to the ones in my Brewfile, so I don't normally use the tapped repo.

The symlink is easy enough to do though, so will probably just add that to the script, thanks for the pointer!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

gibfahn commented 3 years ago

I'm still interested in this.

MikeMcQuaid commented 3 years ago

Going to let @samford make the call whether we leave this open as help wanted or close it out. Given it's gone stale: it does look like it's unlikely to happen without a PR from you @gibfahn (which I recommend you don't work on until @samford gives the idea a 👍🏻 to avoid it being not merged).

samford commented 3 years ago

This issue covers a lot of ground, so I'll summarize the outstanding requested brew features from my perspective:

Passing options to curl from a livecheck block, allowing authentication via --user, --netrc, --header options: This work has been done for a while but I just need to find the time to wrap it up into a PR. I've been pulled into other directions lately (addressing livecheck-related issues and PRs) but this is at the top of my list for when I get a chance, as there's been a lot of demand for it recently.
Allowing a tap to provide a DownloadStrategy, similar to how Livecheck::Strategy can be extended: I don't intend to work on this (I have my hands very full with livecheck), so this idea should be "help wanted" unless @gibfahn wants to take it on.
- If we want to make this "help wanted", a separate issue should be created for it, as I feel like it's buried here.
- Inspiration can be taken from Livecheck#load_other_tap_strategies and I can explain how it works, so someone could get an idea of what may need to change about download_strategy.rb to enable this feature.
- For what it's worth, I don't feel that I'm the person to judge whether this particular idea should/shouldn't be merged into brew (this part is unrelated to livecheck) and I'll leave feedback on it to others.

Based on the above discussion, I believe the original issue description was primarily a request to be able to make authenticated requests with curl in livecheck and this will be enabled by the forthcoming configuration options feature. Integrating livecheck and download strategies isn't something that makes sense to me at the moment (especially as part of the livecheck block DSL), so this issue should be closed and a new one should be opened for the unrelated DownloadStrategy part of this (if that's still desired).

@gibfahn Have I missed (or misinterpreted) anything or does this cover it?

gibfahn commented 3 years ago

Thanks for the summary @samford . I agree that passing options to curl solves the issue in the majority of cases, so definitely waiting for you to get time to raise a PR for those changes (no huge rush on my end, currently have a workaround) is reasonable.

The case that I'm thinking of that probably wouldn't fit this pattern is when you need to run a function to get an auth token, e.g. you want to run the equivalent of curl --header Authorization: Bearer $(get-auth foo.apple.com) foo.apple.com/bar/livecheck, where get-auth may need to do its own curl requests. Maybe that's possible if you can pass a function to the header argument though.

Certainly if you're not working on the latter, help wanted seems reasonable, and I will attempt to get time to get to this when I can.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

MikeMcQuaid commented 3 years ago

Passing on this for now, will review a PR.

gibfahn commented 3 years ago

Passing on this for now, will review a PR.

Fair, but I thought @samford was currently working on a PR (but that there were other pieces that we needed first). Happy to close the issue anyway.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Homebrew / brew