MarcoIeni / release-plz

Publish Rust crates from CI with a Release PR.
https://release-plz.ieni.dev
Apache License 2.0
721 stars 66 forks source link

Git Only feature #1497

Open ozgunozerk opened 1 month ago

ozgunozerk commented 1 month ago

Aims to fix #1144.

I'll edit the PR explanation once the discussions below are finalized.

ozgunozerk commented 1 month ago

Here is the summary of changes:

Background info on repo (to double check my understanding):

Approach:

That's it!

P.S. even for the case there are multiple packages, there shouldn't be a problem. Because, it is already working with multiple packages without the git-only option. From what I understood, it fetches the workspace cargo.toml and then processes each package. Why it should be different in git-only mode :)

The changes I propose will also fetch the workspace cargo.toml file if that's the case, so it should be good to go imo.

@MarcoIeni please let me know if I missed anything.

Also, we can discuss how release command should be modified later after we settle the first part.

Remaining work:

MarcoIeni commented 1 month ago

release command is just for releasing

Yes, it "releases" the current state of the repo, i.e. unpublished crates, running cargo publish, creating tags, GitHub release

The rest of the background info looks correct 👍

The approach looks correct, too!

I will review this PR this soon!

release-plz release

we can discuss how release command should be modified later after we settle the first part.

Yes, probably in another PR we want to avoid cargo publish if git-only is true. Plus, to check if a crate is already published, we look at git tags, instead of looking in the cargo registry.

It would be great to have some integration tests. You can find tests here

One test I would write is:

MarcoIeni commented 1 month ago

I added the new line to the schema in https://github.com/MarcoIeni/release-plz/pull/1500

so that your tests don't fail 👍

ozgunozerk commented 1 month ago

Draft implementation is complete. I'll take a look at the tests and try to write the test you mentioned as the next step. I might be busier than usual this week, but I'm definitely committed to finish this feature. If you @MarcoIeni or someone else wants to chip in to make the process faster, feel free to do so!

ozgunozerk commented 1 month ago

For the scope of this PR (update and release-pr functionalities), I think there is only 1 thing left: get_registry_packages(). After finalizing that, I'll start on writing tests.

I could use some clarifications on the behavior of this function:

First point:

If these assumptions above are true, I think we can leave the registry_manifest as None if there are no valid tags present in the repo.

Second point:

If above understanding is correct, then I believe the code should be changed as the following:

  1. this function will take both git-only parameter and the optional tag parameter
  2. if git-only
  3. create a temp directory as usual
  4. check out to given tag
  5. apply the registry_manifest path to the new temp directory that we checked out to provided version

No change should be required from what I can tell.

When you have the time, I'd appreciate a sanity check on these @MarcoIeni

MarcoIeni commented 1 month ago

First point

Your assumptions seem correct to me. However, I'm not sure about the following:

I think we can leave the registry_manifest as None if there are no valid tags present in the repo.

If registry_manifest is None in the get_registry_packages function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid. Maybe I misunderstood 🤔

Second point

Everything looks good here. More specifically, you are right that the registry_manifest and the local_manifest should be different. Copying to a temporary directory is what I had in mind. 👍 The task list looks correct. 👍 The only thing is that from a quick look at the code, it doesn't seem right that the get_registry_packages takes the "tag" as input, because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package). Again, I might be misunderstanding your explanation 👍

ozgunozerk commented 4 weeks ago

Ok good that we are discussing this, because maybe I misunderstood something.

First point

If registry_manifest is None in the get_registry_packages function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid.

You are right, and I think we already got it covered. But I probably did not clearly explain my ideas. My understanding was:

This means, I can write the below logic for get_registry_packages() function:


Second point

This is where I'm still confused.

because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package)

Unfortunately, this doesn't make sense to me. I'm probably missing some details for the multi-package scenario.

When you have the time, if you can see my comment, maybe we can continue this discussion there.

MarcoIeni commented 3 weeks ago

will return an empty object when the package is not yet released on crates.io

Yes, if your project doesn't have naming conflicts with projects present in crates.io! For example, if your project is called rand, then release-plz will download the rand crate. Imo if git-only is enabled, people should be able to call their crates with names that conflict with something in crates.io and release-plz should work anyway (i.e. we shouldn't download crates from crates.io if people use the git-only option). The logic you described covers this case imo 👍

I replied to that comment (sorry, I missed it!). Happy to provide more help if needed 👍

ozgunozerk commented 3 weeks ago

Hi @MarcoIeni, was a busy week so I'm taking a look just right now.

Btw, thanks a lot for your responsiveness, great communication!

The code changes are trivial, however the biggest problem is not changing the code for this PR. Whilst I was changing the code, I found the root of my confusion. I'll explain it below:

this is crates.io scenario, with multiple packages -> a.k.a main branch

coming back to git-only feature, with multiple packages

Discussion:

I agree that we should compare each package with their own previous version, but I'm puzzled on how does it even work with multiple packages in main branch. My confusion arises from this:

In short: since I assumed everything works for crates.io case, I thought I can follow the same approach, and provide a single registry_manifest. And that registry_manifest should be the latest tag of the repo (corresponds to the latest release in crates.io).

I hope this long convos are not bothering you. I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.

Appreciated ^^

MarcoIeni commented 3 weeks ago

Btw, thanks a lot for your responsiveness, great communication!

Thank you for spending time on this issue 😄

we have a single registry_manifest file even though we have multiple publishable packages

Yes, because the manifest can be a "workspace manifest". For example, take a look at the Cargo.toml in the root of this repository.

hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.

Yes 👍

monorepos are not publishable in crates.io IIRC, but their individual packages can be published individually.

Yes 👍

I hope this long convos are not bothering you. I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.

No problem, this issue is very difficult, it's normal that you ask doubts!

how can we achieve the correct behavior by providing only a single registry_manifest path?

I thought about the following algorithm, but maybe I'm missing something:

  1. read packages from the main branch by providing the registry_manifest file.
  2. Determine the latest tag of each package
  3. When comparing each package, checkout to the latest tag of that package
ozgunozerk commented 2 weeks ago

I thought about the following algorithm, but maybe I'm missing something:

  1. read packages from the main branch by providing the registry_manifest file.
  2. Determine the latest tag of each package
  3. When comparing each package, checkout to the latest tag of that package

Great! This resolves my biggest confusion.

I was and am planning to provide a single cargo.toml file as the registry_manifest. Then in our discussions, you asked if the function get_the_latest_repo_tag is able to retrieve all the latest tags for each package. Then I was confused, because I thought we should provide a single cargo.toml file as we discussed:

hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.

Yes 👍

But now I realized you asked if the function is able to retrieve latest tags for all packages NOT for the registry_manifest file to be fetched from GitHub for the comparison. These latest_tags are to be used later for the individual package comparison. Now it makes sense 👍

This whole discussion is basically about me being lazy by not looking at the code and trying to understand how this project works for the multiple packages case 😅

Coding the solution is pretty easy after understanding all these.

I'm on vacation for the last week, hence the late reply. I'll take a look at the codebase once my vacation is over.

MarcoIeni commented 2 weeks ago

Great, enjoy your vacation :D

ozgunozerk commented 2 weeks ago

registry_manifest

For the sake of brevity, I’ll refer to the case: monorepo with registry_manifest provided (current main-branch) as main-multiple.

I took a look at the code, and I think your suggestions & questions has discrepancies with the current main-multiple (or even more likely, I’m missing some details).

I was trying to replicate the main-multiple case for the git-only feature, since they should be roughly identical in terms of business logic.

Let’s focus on the get_registry_packages() function, because after that, everything should be identical afaik.

get_registry_packages():

  1. takes the inputs:
    1. registry_manifest → this is the main discussion of this PR, I’ll come back to it.
    2. publishable packages of local project → should require no change for git-only
    3. registry → we shouldn’t need one for git-only case, and as expected, it is also not used in main-multiple case, so I believe we can ignore this one as well for the sake of this discussion
  2. if registry_manifest is provided:
    1. finds the publishable packages by parsing the registry_manifest
      1. in the case of main-multiple, the directories of registry_manifest and the local_manifest are already different. So in git-only case, we have to create an additional temporary directory and check out to given tag/commit prior to calling get_registry_packages(), so that the directory of registry_manifest can be different from local_manifest, and everything can work as usual.

And that’s it for the release-pr phase I guess! The rest should be related to release part, for which, I’ll probably open another PR.

Coming back to our previous discussion, you asked the following:

Does the implementation of this PR recognize the right tag for the right package?

My initial approach has not changed after I read the codebase tbh. The right tag for the right package should be already detected by the existing code. Else, how will main-multiple work correctly in the first place?

So, I think, I should only find the correct cargo.toml to be provided as the registry_manifest, and that’s it. This PR should not be responsible from detecting the right tag for each package.

TL;DR

I was focused on finding the latest released cargo.toml, hence, finding the latest tag only. Your focus was on detecting each latest tag for the each package separately. I think this is the crux of the discussion.


Potential bug in main-multiple

Why I think main-multiple might not be working correctly:

in tag b-v1.0.0, the version of package a is also v1.0.0, so nothing changed about a.

I'm not sure about this, maybe people released b, but they didn't release a yet. But this new tag contains unreleased changes of a.

Now, I think using the latest tag’s cargo.toml might be an erroneous approach. But immediately after that, a new question arises:

If you prefer to have a short sync/call, I'm all for it. I really like the utility of this crate, and after finishing this PR, I was thinking about submitting other PR's to improve the documentation and possible refactorings that I think you might appreciate.

Best!

MarcoIeni commented 2 weeks ago

Let's have a call, yes. We can even work on this together with pair programming 👍 I sent you a DM on linkedin :)

MarcoIeni commented 2 weeks ago

Summary of our call

get_registry_packages

For the git-only feature, IN THEORY we can't rely on the get_registry_packages function in its current state because when we run cargo metadata (by calling cargo_utils::get_manifest_metadata) we might get unpublished results from certain crates. Instead this function should parse the packages as they were published.

For example, one could:

Now when we run cargo metadata on the latest tag of the repo (crate_b), we parse the Cargo.toml of crate_a in the wrong way.

The solution to this is to git checkout latest_tag_of_each_crate before running cargo metadata.

To avoid performance issues, we could run cargo_metadata inside the directory of the crate, so that we don't parse the entire workspace every time, but this can also be a TODO in the code for this PR 👍

diff algorithm

In the diff algorithm, we need to git checkout to the latest version of each package in the registry_packages temporary directory. Right now the temporary directory field is unused. Imo we should use it to run git checkout latest_tag before doing the diff (around here)