Open ozgunozerk opened 1 month ago
Here is the summary of changes:
release
command is just for releasing, it doesn't generate changelog nor bumping versions. However, it can be used together with release_pr
to automate the whole processupdate
and release_pr
commands calls update()
function internally to generate the changelog and bumping the versionsupdate()
function calls next_versions()
, and this is the most critical part imo. Due to:
next_versions()
fetches the "to be compared" packages from either registry (crates.io by default) or local file systemregistry_packages
) against local_packages
git-only
feature, we basically want to fetch the registry_packages
from local file system (details will be discussed below)registry_packages
, all should be good, because rest of the business logic does not need to know whether packages came from git or not. Rest of the business logic will just generate the changelog and bump versions based on the content of the packages, not based on where they come from.next_versions()
function -> if git-only
flag is present, and registry_manifest
is not provided:
registry_manifest
That's it!
P.S. even for the case there are multiple packages, there shouldn't be a problem. Because, it is already working with multiple packages without the git-only
option. From what I understood, it fetches the workspace cargo.toml and then processes each package. Why it should be different in git-only
mode :)
The changes I propose will also fetch the workspace cargo.toml file if that's the case, so it should be good to go imo.
@MarcoIeni please let me know if I missed anything.
Also, we can discuss how release
command should be modified later after we settle the first part.
Remaining work:
release
command itself has not been touched. I haven't checked it yet, so I don't know whether it will require modifications (I assume it will).release command is just for releasing
Yes, it "releases" the current state of the repo, i.e. unpublished crates, running cargo publish
, creating tags, GitHub release
The rest of the background info looks correct 👍
The approach looks correct, too!
I will review this PR this soon!
release-plz release
we can discuss how release command should be modified later after we settle the first part.
Yes, probably in another PR we want to avoid cargo publish
if git-only
is true. Plus, to check if a crate is already published, we look at git tags, instead of looking in the cargo registry.
It would be great to have some integration tests. You can find tests here
One test I would write is:
git_only
optionrelease-plz release-pr
release-plz release
release-plz release-pr
againI added the new line to the schema in https://github.com/MarcoIeni/release-plz/pull/1500
so that your tests don't fail 👍
Draft implementation is complete. I'll take a look at the tests and try to write the test you mentioned as the next step. I might be busier than usual this week, but I'm definitely committed to finish this feature. If you @MarcoIeni or someone else wants to chip in to make the process faster, feel free to do so!
For the scope of this PR (update
and release-pr
functionalities), I think there is only 1 thing left: get_registry_packages()
. After finalizing that, I'll start on writing tests.
I could use some clarifications on the behavior of this function:
git-only
, and assume if there is no released version of the package. Then, this function will still return a PackagesCollection
object, however it will be effectively empty, right? Due to no package could be downloaded from registry...
PackagesCollection
object. If these assumptions above are true, I think we can leave the registry_manifest
as None
if there are no valid tags present in the repo.
registry_manifest
is provided to this function, then it tries to read from local (already downloaded) files.
git-only
case, our files are already downloaded locally. So, it make sense to use the registry_manifest
registry_manifest
and the local_manifest
will be pointing towards the same path. Since we are comparing a version of the repo with another version of the repo, the path to cargo.toml
will be the same.registry_manifest
provided should be different than the local_manifest
for this function to work properly. If above understanding is correct, then I believe the code should be changed as the following:
git-only
parameter and the optional tag
parametergit-only
registry_manifest
path to the new temp directory that we checked out to provided versionNo change should be required from what I can tell.
When you have the time, I'd appreciate a sanity check on these @MarcoIeni
Your assumptions seem correct to me. However, I'm not sure about the following:
I think we can leave the registry_manifest as None if there are no valid tags present in the repo.
If registry_manifest is None in the get_registry_packages
function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid.
Maybe I misunderstood 🤔
Everything looks good here. More specifically, you are right that the registry_manifest
and the local_manifest
should be different. Copying to a temporary directory is what I had in mind. 👍
The task list looks correct. 👍
The only thing is that from a quick look at the code, it doesn't seem right that the get_registry_packages
takes the "tag" as input, because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package). Again, I might be misunderstanding your explanation 👍
Ok good that we are discussing this, because maybe I misunderstood something.
If
registry_manifest
isNone
in theget_registry_packages
function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid.
You are right, and I think we already got it covered. But I probably did not clearly explain my ideas. My understanding was:
get_registry_packages()
will return an empty object when the package is not yet released on crates.io
.
This means, I can write the below logic for get_registry_packages()
function:
tag
present, and git-only
is true
:
get_registry_packages()
by returning the empty object (as in the case of the not yet released package in crates.io).PackagesCollection
object when no tag is present in GitHub).This is where I'm still confused.
because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package)
Unfortunately, this doesn't make sense to me. I'm probably missing some details for the multi-package scenario.
When you have the time, if you can see my comment, maybe we can continue this discussion there.
will return an empty object when the package is not yet released on crates.io
Yes, if your project doesn't have naming conflicts with projects present in crates.io! For example, if your project is called rand
, then release-plz will download the rand
crate.
Imo if git-only is enabled, people should be able to call their crates with names that conflict with something in crates.io and release-plz should work anyway (i.e. we shouldn't download crates from crates.io if people use the git-only option). The logic you described covers this case imo 👍
I replied to that comment (sorry, I missed it!). Happy to provide more help if needed 👍
Hi @MarcoIeni, was a busy week so I'm taking a look just right now.
Btw, thanks a lot for your responsiveness, great communication!
The code changes are trivial, however the biggest problem is not changing the code for this PR. Whilst I was changing the code, I found the root of my confusion. I'll explain it below:
crates.io
scenario, with multiple packages -> a.k.a main
branchregistry_manifest
file even though we have multiple publishable packagesnext_versions()
is just taking the InputRequest
and then returns PackagesUpdate
. So, the transition from a single Cargo.toml
file to multiple packages is happening in next_versions()
.
get_registry_packages()
function, which takes input.registry_manifest
as the input, and outputting PackagesCollection
get_registry_packages()
, if we have provided a registry_manifest
path, we extract all the publishable packages with publishable_packages_from_manifest()
method. This is the first concrete step for handling multiple packages from what I've seen.git-only
feature, with multiple packagesregistry_manifest
(as the workspace cargo.toml
I assume) in the crates.io scenario, we can do the same for the git-only
option. registry_manifest
, instead of multiple cargo.toml
s for each package.I agree that we should compare each package with their own previous version, but I'm puzzled on how does it even work with multiple packages in main
branch. My confusion arises from this:
registry_manifest
path? In short: since I assumed everything works for crates.io
case, I thought I can follow the same approach, and provide a single registry_manifest
. And that registry_manifest
should be the latest tag of the repo (corresponds to the latest release in crates.io).
I hope this long convos are not bothering you. I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.
Appreciated ^^
Btw, thanks a lot for your responsiveness, great communication!
Thank you for spending time on this issue 😄
we have a single registry_manifest file even though we have multiple publishable packages
Yes, because the manifest can be a "workspace manifest". For example, take a look at the Cargo.toml in the root of this repository.
hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.
Yes 👍
monorepos are not publishable in crates.io IIRC, but their individual packages can be published individually.
Yes 👍
I hope this long convos are not bothering you. I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.
No problem, this issue is very difficult, it's normal that you ask doubts!
how can we achieve the correct behavior by providing only a single registry_manifest path?
I thought about the following algorithm, but maybe I'm missing something:
main
branch by providing the registry_manifest
file.checkout
to the latest tag of that packageI thought about the following algorithm, but maybe I'm missing something:
- read packages from the main branch by providing the registry_manifest file.
- Determine the latest tag of each package
- When comparing each package, checkout to the latest tag of that package
Great! This resolves my biggest confusion.
I was and am planning to provide a single cargo.toml
file as the registry_manifest
. Then in our discussions, you asked if the function get_the_latest_repo_tag
is able to retrieve all the latest tags for each package. Then I was confused, because I thought we should provide a single cargo.toml
file as we discussed:
hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.
Yes 👍
But now I realized you asked if the function is able to retrieve latest tags for all packages NOT for the registry_manifest
file to be fetched from GitHub for the comparison. These latest_tags
are to be used later for the individual package comparison. Now it makes sense 👍
This whole discussion is basically about me being lazy by not looking at the code and trying to understand how this project works for the multiple packages case 😅
Coding the solution is pretty easy after understanding all these.
I'm on vacation for the last week, hence the late reply. I'll take a look at the codebase once my vacation is over.
Great, enjoy your vacation :D
registry_manifest
For the sake of brevity, I’ll refer to the case: monorepo with registry_manifest
provided (current main-branch) as main-multiple
.
I took a look at the code, and I think your suggestions & questions has discrepancies with the current main-multiple
(or even more likely, I’m missing some details).
I was trying to replicate the main-multiple
case for the git-only
feature, since they should be roughly identical in terms of business logic.
Let’s focus on the get_registry_packages()
function, because after that, everything should be identical afaik.
get_registry_packages()
:
registry_manifest
→ this is the main discussion of this PR, I’ll come back to it.git-only
registry
→ we shouldn’t need one for git-only
case, and as expected, it is also not used in main-multiple
case, so I believe we can ignore this one as well for the sake of this discussionregistry_manifest
is provided:
registry_manifest
main-multiple
, the directories of registry_manifest
and the local_manifest
are already different. So in git-only
case, we have to create an additional temporary directory and check out to given tag/commit
prior to calling get_registry_packages()
, so that the directory of registry_manifest
can be different from local_manifest
, and everything can work as usual.And that’s it for the release-pr
phase I guess! The rest should be related to release
part, for which, I’ll probably open another PR.
Coming back to our previous discussion, you asked the following:
Does the implementation of this PR recognize the right tag for the right package?
My initial approach has not changed after I read the codebase tbh. The right tag for the right package should be already detected by the existing code. Else, how will main-multiple
work correctly in the first place?
So, I think, I should only find the correct cargo.toml
to be provided as the registry_manifest
, and that’s it. This PR should not be responsible from detecting the right tag for each package.
I was focused on finding the latest released cargo.toml
, hence, finding the latest tag only. Your focus was on detecting each latest tag for the each package separately. I think this is the crux of the discussion.
main-multiple
Why I think main-multiple
might not be working correctly:
cargo.toml
file of that commit as the registry_manifest
. However, your argument convinced me:in tag b-v1.0.0, the version of package a is also v1.0.0, so nothing changed about a.
I'm not sure about this, maybe people released
b
, but they didn't releasea
yet. But this new tag contains unreleased changes ofa
.
Now, I think using the latest tag’s cargo.toml
might be an erroneous approach. But immediately after that, a new question arises:
How does it even work for the main-multiple
?
we are also providing a single cargo.toml
file in main-multiple
case. And I think your counter-argument is still valid for main-multiple
,
I'm not sure about this, maybe people released
b
, but they didn't releasea
yet. But this new tag contains unreleased changes ofa
.
main-multiple
might not be working correctly 😅If you prefer to have a short sync/call, I'm all for it. I really like the utility of this crate, and after finishing this PR, I was thinking about submitting other PR's to improve the documentation and possible refactorings that I think you might appreciate.
Best!
Let's have a call, yes. We can even work on this together with pair programming 👍 I sent you a DM on linkedin :)
For the git-only feature, IN THEORY we can't rely on the get_registry_packages
function in its current state because when we run cargo metadata
(by calling cargo_utils::get_manifest_metadata
) we might get unpublished results from certain crates. Instead this function should parse the packages as they were published.
For example, one could:
readme
field of crate_a Cargo.toml Now when we run cargo metadata
on the latest tag of the repo (crate_b), we parse the Cargo.toml of crate_a in the wrong way.
The solution to this is to git checkout latest_tag_of_each_crate
before running cargo metadata
.
To avoid performance issues, we could run cargo_metadata
inside the directory of the crate, so that we don't parse the entire workspace every time, but this can also be a TODO in the code for this PR 👍
In the diff algorithm, we need to git checkout
to the latest version of each package in the registry_packages
temporary directory.
Right now the temporary directory field is unused. Imo we should use it to run git checkout latest_tag
before doing the diff (around here)
Aims to fix #1144.
I'll edit the PR explanation once the discussions below are finalized.