cargo: Generate a Cargo.lock if one wasn't found

nanonyme commented 1 year ago

Spec from https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/42

Will behave like this:

track: Cargo.lock will be generated by the host's cargo IF (generate_lock is true) OR (generate_lock is 'if-missing' and Cargo.lock doesn't exist). If generate_lock is false and Cargo.lock doesn't exist, raise an error. Then the existing logic for tracking cargo sources will happen. The ref will now encode everything necessary to reconstruct the vendored libraries, so the generated Cargo.lock is discarded
fetch: The sources are fetched according to the ref, as they are now
stage: The sources are staged according to the ref, as they are now
Buildstream will default to generate_lock: false, and projects can (if they wish) globally override that default to generate_lock: if-missing to make the cargo plugin work in all scenarios. Then generate_lock: true can be used if Cargo.lock needs to be overwritten/ignored for whatever reason (i.e. a hypothetical repo that hasn't been updated in a while and Cargo.lock specifies dependencies w/ known security issues, so you want to force it to use newer libraries)
If a project doesn't want to grow a dependency on cargo from the host to track, it can keep generate_lock: false as the default, and then maintain its own Cargo.lock files & stage them before the cargo source:
```
sources:
```
kind: git_tag url: whatever:foobar.git
kind: local path: files/cargo/foobar/Cargo.lock
kind: cargo generate_lock: false # <-- from project defaults

nanonyme commented 1 year ago

Another option is to create separate plugin cargo-lock-generator. Then you have

git
cargo-lock-generator
cargo

and cargo can consume lock as normal. cargo-lock-generator only generates cargo.lock when tracking but otherwise does nothing. It shall depend on previous source both for track and fetch.

nanonyme commented 1 year ago

@gtristan any opinions?

nanonyme commented 1 year ago

Creating new plugin does not change API or cache keys. Adding functionality to existing plugin does not necessarily need cache key change since the new functionality kicks in only with track.

AdrianVovk commented 1 year ago

cargo-lock-generator only generates cargo.lock when tracking but otherwise does nothing

This is rather ugly, in my experience.

Here's how it ends up working out:

if we generate the Cargo.lock file only during tracking, this ends up w/ a broken project for people who haven't tracked that element. Basically anyone who wants to be able to just clone & build will be unsuccessful because, simply, the Cargo.lock file doesn't exist!
if we generate the Cargo.lock file during both tracking and fetching, the problem is that we cannot reliably generate an identical Cargo.lock file. Basically this would turn the fetching step into tracking, which breaks Buildstream again in a different way
The only solution is to store the Cargo.lock file in the project ref. This is very ugly

Building this into the cargo plugin is optimal because it alleviates the need for the Cargo.lock file if not tracking

Edit: Actually I suppose that Cargo.lock only existing during/after a track can be fine if the cargo plugin is willing to handle that. I seem to recall it blowing up for whatever reason.

nanonyme commented 1 year ago

I don't see any reason why cargo plugin would need Cargo.lock after tracking. It generates things to download based on it during tracking and records them in ref.

AdrianVovk commented 1 year ago

An issue with the separate plugin: We have to actually store the Cargo.lock file and then stage it, so that the cargo plugin can pick it up and use it. But this means that later, during the element's build, we actually place a Cargo.lock file into the sources. On a separate build machine, we would instead build with the Cargo.lock file missing (since it isn't really necessary). However, this behavior introduces a different build environment without changing the cache keys!

nanonyme commented 1 year ago

IMHO any generated cargo lock must be the ref of said plugin.

AdrianVovk commented 1 year ago

So you mean that the generated cargo.lock file should be base64 encoded and then stored in the ref?

That generates a massive ref. I don't mind since it lives in project.refs, but it it ends up being necessary in freedesktop-sdk you'll end up with loads of chunk in your element

nanonyme commented 1 year ago

I don't think it's that big a price to pay for determinism to store this information in ref. Also, freedesktop-sdk is not really needing this functionality in the first place as all the projects we use have Cargo.lock. This is more aimed for various downstreams and developers.

AdrianVovk commented 1 year ago

store this information in ref

It's already stored in the ref, at track time, by the cargo plugin.

Cargo.lock isn't necessary during the actual build, is it? I'm actually not sure now...

for determinism

Simply not including a Cargo.lock file isn't less deterministic than storing it in project.refs

Also, freedesktop-sdk is not really needing this functionality

For now...

nanonyme commented 1 year ago

Oh, never mind, I misread your point. Ok, I guess I'm convinced with adding it to cargo plugin itself as long as this does not add dependency by default. I don't think you even need to change add generate_lock into cache keys since it's for tracking only here so this should be doable as a mostly backwards-compat change.

nanonyme commented 1 year ago

@AdrianVovk freedesktop-sdk has been cooperating with any upstreams not shipping Cargo.lock and convincing them to do that. We plan to keep this strategy. Not shipping Cargo.lock with releases is a basically a bug.

AdrianVovk commented 1 year ago

freedesktop-sdk has been cooperating with any upstreams not shipping Cargo.lock and convincing them to do that

Some upstreams feel strongly about not shipping Cargo.lock... So unfortunately there's no escaping the need to generate it sometimes

i.e. https://github.com/systemd/zram-generator/issues/65

nanonyme commented 1 year ago

Yeah, I guess we live in an imperfect world where people sometimes do not follow best practices. Anyway, point stands. What that project is doing is not a general thing for people to do. I am not saying that Cargo.lock should always exist, it is very common for it not to. However, when generating tags, it usually does. You can even have same commit retagged multiple times with different Cargo.lock files if you so want. Once you have tested that this combination actually builds.

nanonyme commented 1 year ago

* if we generate the Cargo.lock file only during tracking, this ends up w/ a broken project for people who haven't tracked that element. Basically anyone who wants to be able to just clone & build will be unsuccessful because, simply, the Cargo.lock file doesn't exist!

This is essentially the thing which made me believe you claimed Cargo.lock needs to exist during build. It is perfectly fine to expect an element has to be tracked at least once and outputs committed into git for it to be buildable. However, refs stored in git must be sufficient for instrumenting source fetching and building.

nanonyme commented 1 year ago

In any case, if you would like to proceed with this, I would suggesting making proposal in the form of a PR. I would really love to have tests added for cargo plugin as well in the same go, it currently has none making any changes to it very dangerous.

AdrianVovk commented 1 year ago

It is perfectly fine to expect an element has to be tracked at least once and outputs committed into git for it to be buildable. However, refs stored in git must be sufficient for instrumenting source fetching and building.

I see how my working could have been confusing. I meant that if you just generate the Cargo.lock file (and store it in the mirror dir) during tracking (and commit the resulting project.refs file), someone on the other side wouldn't be able to build since they wouldn't have a Cargo.lock file to stage. As you say, refs stored in git must be sufficient to fetch and build.

I would suggesting making proposal in the form of a PR

:+1:

I would really love to have tests added for cargo plugin as well in the same go, it currently has none making any changes to it very dangerous.

What do you expect such a test to look like? A project that junctions off freedesktop-sdk to get a working cargo, then tracks, compiles, and runs some hello-world rust project?

How do we find out if the cache key changed? Hard-code it into the test and have to manually update it w/ major buildstream updates?

nanonyme commented 1 year ago

It is perfectly fine to expect an element has to be tracked at least once and outputs committed into git for it to be buildable. However, refs stored in git must be sufficient for instrumenting source fetching and building.

I see how my working could have been confusing. I meant that if you just generate the Cargo.lock file (and store it in the mirror dir) during tracking (and commit the resulting project.refs file), someone on the other side wouldn't be able to build since they wouldn't have a Cargo.lock file to stage. As you say, refs stored in git must be sufficient to fetch and build.

I don't think I still get it. Do you or do you not need a Cargo.lock in build sandbox to build the project?

gtristan commented 1 year ago

I don't think I still get it. Do you or do you not need a Cargo.lock in build sandbox to build the project?

This is indeed the question.

Also the question is whether some projects require the Cargo.lock at build time while others do not, and whether upstream will potentially change one day such as to start requiring the Cargo.lock at build time.

The uncertainty around these details leads me to intuit that it would be safer and simpler to just have a separate cargo-generate-lock plugin which serializes the Cargo.lock data into its ref.

As a side note, we would probably want such a lock generating plugin to only call utils.get_host_tool() directly in Source.track() rather than in Plugin.preflight(), this will fail a bit later than usual when lacking host cargo but will not require host cargo to build unless you run bst source track.

AdrianVovk commented 1 year ago

Implemented here: https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/239

apache / buildstream-plugins

cargo: Generate a Cargo.lock if one wasn't found #44