[pub] do a smallest update heuristic for security updates

jonasfj commented 2 years ago

When there is a security update to a package dependabot should be able to make a smallest possible update that gets the security fix.

For pub we'll probably need to apply a heuristic, @sigurdm we should continue the discussion on this.

jeffwidman commented 1 year ago

@jonasfj @sigurdm 👋 I'm planning to wire up security updates for Pub in the next few weeks.

Do you have interest/time to create a PR adding support for the smallest update / minimal version required to bump to a non-vulnerable version?

My focus will primarily be on wiring up the plumbing so that security alerts can trigger Dependabot PR's, so it's doubtful I'll have time to add a new version strategy (esp as I'm not super familiar with the pub ecosystem).

If you can't get to it, no problem, I can just default to using the latest version strategy for now. But thought I'd check, as it'd be more efficient to wire it all up correctly from the get go rather than have to circle back to update it later.

jonasfj commented 1 year ago

Hmm, I'll chat with sigurdm@ when he is back tomorrow.

I think we had a plan to do sometime. But there is lots of balls in the air -- as always. And its been some time. I don't recall if we figured out how we were supposed to integrate/expose a new upgrade strategy.

@sigurdm as I recall we were thinking of increasing the lower bound constraint in-memory and then doing a downgrade solve.. or something like that. Because we want a newer version than the advisory affects, and we don't want older versions of anything we have. But otherwise we're not interested in upgrading anything beyond that -- hence, a downgrade solve with those lower bound constraints.

sigurdm commented 1 year ago

@sigurdm as I recall we were thinking of increasing the lower bound constraint in-memory and then doing a downgrade solve..

Yeah I suggested that - but after we discussed a bit I think we found out that won't work if there are more affected versions coming after the current.

I think we really need a way for dependabot to communicate all the affected versions to pub dependency_services such that we can forbid all of them in a solve.

jeffwidman commented 1 year ago

That should be simple enough... at the point that dependency_services are called, the full array of security_advisories is available... here's a quick example of what's available using a fake advisory I injected into the dry-run script:

SECURITY_ADVISORIES='[{"dependency-name":"retry","patched-versions":[],"unaffected-versions":[],"affected-versions":["<= 3.0.2"]}]' bin/dry-run.rb pub test_org/test_repo --cache=files --dep=retry

results in:

(ruby) security_advisories
[#<Dependabot::SecurityAdvisory:0x00007f9d3e099150
  @dependency_name="retry",
  @package_manager="pub",
  @safe_versions=[],
  @vulnerable_version_strings=["<= 3.0.2"],
  @vulnerable_versions=[Gem::Requirement.new(["<= 3.0.2"])]>]

jeffwidman commented 1 year ago

Before we go down this path too far, I did have one question: I'm still coming up to speed on the codebase, but it looks like in most other ecosystems we rely on the native helper to return a list of possible versions that may match, and then we do post-processing to filter out the git versions, ignored versions, vulnerable versions etc in Ruby. Versus here in Pub if I read the code correctly the native helper returns a single matching version, which then gets dropped if it's an ignored version. But what about the scenario where a user is on 3.1.0, ignores 3.2 and latest is [3.1.1, 3.2]? I assume the pub helper will return 3.2, which we'll filter out as ignored... but we won't bump the user up to 3.1.1, right?

The reason I ask is if we start to go into the realm of passing data to the native helper, should we also be passing in ignored versions?

FWIW, going forward I'd like to see us move toward the model across all our ecosystems of exposing a list of ignored/vulnerable versions to the native helper and letting them handle the resolving internally.

jonasfj commented 1 year ago

But what about the scenario where a user is on 3.1.0, ignores 3.2 and latest is [3.1.1, 3.2]? I assume the pub helper will return 3.2, which we'll filter out as ignored... but we won't bump the user up to 3.1.1, right?

Correct, this is a known limitation.

Versus here in Pub if I read the code correctly the native helper returns a single matching version....

For most eco-systems the entire resolution logic is reimplemented in ruby. This is hard to do correctly. In the case of pub it requires writing a CDCL solver (an implementation of PubGrub). It's a wonderful algorithm, but making an implementation is non-trivial. And rooting out all the bugs would take a long time.

For this reason we opted to write dependency-services kind of thing, see: https://github.com/dependabot/dependabot-core/tree/main/pub#dependabot-pub

The idea is that we implement a CLI interface which dependabot then uses. We ended up fighting the abstractions in dependabot a lot -- so there is a few hacks like caching files... but it works fairly well. When we did this the discussion was that many dependabot could eventually publish a specification for the CLI interface, such that each eco-system just implements the CLI interface in their native language -- ideally reusing the logic they have in their package manager.

I could even imagine package managers implementing such a specification, similar to how many compilers ship an LSP implementation today.

sigurdm commented 1 year ago

Correct, this is a known limitation.

And to fix it we would have to pass the ignored versions to the native helper, such that they can be ignored in the solve.

Just like we would have to pass the vulnerable versions to the solver to be able to avoid them.

One complication here is how the ignored versions are represented as ranges in dependabot (https://github.com/dependabot/dependabot-core/blob/main/common/lib/dependabot/security_advisory.rb#L115) - where we would much prefer it to be a list of single affected versions, simplifying the communication between the processes. (We'd prefer dependabot to handle versions as opaque strings, such that the ecosystem specific details won't have to be re-implemented in the ruby-side).

Sure there are ways around this, I'm just not sure what is the best way.

jeffwidman commented 1 year ago

Ah good point, I didn't think of the version handling bit... Yeah, the version handling is a known pain point... I know within version updates we'd like to move away from coercing everything to the Ruby Version gem.

However, it might not be so straightforward... ie, internally this may require changing not only dependabot-core, but possibly also the entire advisory database and pipeline... and that'd be a much larger lift. And there may be reasons I'm unaware of that those teams want to keep it in ruby-parsable-versions, such as for data analysis, IDK...

Overall the idea of not re-implementing package managers in Ruby is a consistent theme I've heard folks internally continuing to noodle on. https://github.com/dependabot/cli was starting to explore it a bit, but this past few months we've been heads down on various internal infra / product things (like auto-disabling forks etc) and the package manager / dependabot-core interface hasn't changed much.

But it's a known pain point, and fairly separate from our other work, so it's not blocked on our other technical work, just lacking the engineering time. However, I know a number of folks on the team have a high degree of interest in it, so I'd expect us to start to make progress on it in the not too distant future.

sigurdm commented 1 year ago

But it's a known pain point, and fairly separate from our other work, so it's not blocked on our other technical work, just lacking the engineering time. However, I know a number of folks on the team have a high degree of interest in it, so I'd expect us to start to make progress on it in the not too distant future.

Any news here?

I'd like to get started on implementing the smallest update heuristic in the native helper.

But would like to know how the vulnerable versions will be encoded.

sigurdm commented 1 year ago

I started prototyping this, and saw one potential issue here.

We do the update-check for all dependencies at once - that means we need to know the vulnerable versions of all the dependencies of a package up-front.

Not sure how well that works with the current dependabot pipeline...

jeffwidman commented 1 year ago

Thanks for the nudge. I'm going to start a conversation with a few folks internally on ways to solve this... as you noted, it'd be very helpful if we could figure out a better way to handle passing versions around than coercing them to/from ruby-equivalent versions. Tough part will be that it'll probably be a multi-team effort, but hopefully we can at least figure out a blueprint forward, even if actual implementation may take a while.

mctofu commented 1 year ago

I started prototyping this, and saw one potential issue here.

We do the update-check for all dependencies at once - that means we need to know the vulnerable versions of all the dependencies of a package up-front.

Not sure how well that works with the current dependabot pipeline...

@sigurdm For a security update the job will be filtered to only the vulnerable dependency so we won't be attempting to check for updates to any other dependency.

sigurdm commented 1 year ago

@sigurdm For a security update the job will be filtered to only the vulnerable dependency so we won't be attempting to check for updates to any other dependency.

Ok, that makes sense - will think about this!

jeffwidman commented 1 year ago

FWIW, I have not forgotten about this. I've had a few internal conversations, and there's a lot of places we use versions, and for some usages we perhaps could treat as opaque strings, and others would be far more difficult. There's definite interest in improving things, but no one has taken the time to tease it all apart and see what's possible and what's not... it's something I'm interested in working on, but the short answer is don't expect changes in the near term re: how we pass around version strings. 😢

sigurdm commented 1 year ago

Thanks for the heads-up. I will work on a solution that takes in a list of restricted version constraints and assumes they can be parsed as pub version constraints. That might be ok.

sigurdm commented 1 year ago

Looking into integrating this in dependabot-core, and have a few further questions.

I realize that not only do we have a set of somewhat opaque requirements, we have two sets of those (vulnerable and safe)

Could a method be provide a method that took all vulnerable requirements and subtracted all safe requirements (possibly cutting holes in the ranges)? Those could then be sent as disallowed constraints to the pub helper. Sending two lists seems redundant. But I am a bit intimidated by the Gem::Requirement API.

Another possibility would be for dependabot to fetch the list of all versions and do the filtering, and send the versions to the helper. That would work, but now both the helper and dependency services would access pub.dev
Can we assume that all vulnerabilities for pub are pertaining to the https://pub.dev host or is the repository url for a security vulnerability available somewhere?
I don't understand the difference between https://github.com/dependabot/dependabot-core/blob/7d0ff2b39fd344601b33ff160895da321153ce5e/common/lib/dependabot/security_advisory.rb#L23 and https://github.com/dependabot/dependabot-core/blob/7d0ff2b39fd344601b33ff160895da321153ce5e/common/lib/dependabot/security_advisory.rb#L76 do they do the same thing?
Can we expect the ranges of a security vulnerability to always be parsed with the Pub::Requirement class?

dependabot / dependabot-core

[pub] do a smallest update heuristic for security updates #5391