Open emkornfield opened 2 weeks ago
I'm not an expert in python. Just want to ask a general question on whether we should enforce using the merge script for parquet-java (or other parquet-xxx projects). It has been broken for a long time.
@gszadovszky @julienledem Thoughts?
I think the main thing the script does (which I haven't fully tested) is allow for easy merging of backports, and JIRA closing (which given current ML thread might not be useful for super long).
There is a description about it in the dev/README. It seems this is for the time when we were not able to simply push the PRs from github. Github was a read only mirror for a time being and we used the apache git repo to push the commits into.
I don't remember using this script. I think we do not need it anymore.
For reference there is a similar script in Arrow which I think probably has similar origins. The additional benefit even for github, is it looks like it tags issues with appropriate fix versions. Not sure if parquet-java has other ways of tracking this?
There is a similar script in Apache ORC: https://github.com/apache/orc/blob/main/dev/merge_orc_pr.py. It can help keep the PR description, update JIRA state and backport commit to other branches in a single shot. Perhaps we do not want to enforce using the script but it might help in the above cases.
I haven't used this script and would prefer to see if we can have a more Github-oriented workflow. Rok just raised a vote to migrate the issues from Jira to GitHub.
@Fokko in your experience how have issues been tagged with fixes for GH oriented workflows?
@Fokko in your experience how have issues been tagged with fixes for GH oriented workflows?
@Fokko ping, want to understand what you where thinking here?
@emkornfield Sorry for the late reply here, the mailbox is a bit swamped, thanks for the ping.
For Iceberg we make these Github releases (example for 1.5.0). Github will generate this list for you, and then the release manager groups and curates this a bit to make it easier to read. Merging would just go through the Github UI since we don't need to keep track of things in Jira, and this way we don't need to use the script.
@Fokko Thanks, I still think the Arrow issues which use the tool might keep better records.
For instance if you look at a recently resolved issue in Arrow you see it is tagged with a fix version (I believe this is due to using a script). I don't see something similar in a recent Iceberg. I might be missing something but I think the Arrow approach is nicer as it allows for me to directly find an issue and determine which release I might need to use to make use of it.
I also think being able to easily backport fixes to other branches via one step process is nicer even if it doesn't use the UI (but this is likely a rarer use-case).
Thoughts?
In Iceberg we create backports by hand. We checkout the branch, and backport using cherry-picking the commits, and create a PR. Can I ask how you resolve merge conflicts with older versions?
In Iceberg we create backports by hand. We checkout the branch, and backport using cherry-picking the commits, and create a PR. Can I ask how you resolve merge conflicts with older versions?
In the merge script of Apache ORC, it asks which branch to cherry-pick the commit and tries to do that automatically. If it succeeds, then we are done. Or when it fails, it prints error message and asks the developer to resolve the cherry-pick conflict manually and then come back to the script by answering Y
to the question it asked to proceed: https://github.com/apache/orc/blob/main/dev/merge_orc_pr.py#L128-L136
BTW, as I have asked in https://github.com/apache/parquet-java/issues/2932, it would be good to automatically close the Github issue related to the PR.
My biggest concern is properly tagging issues to milestones. I think the main question we should answer here is do we want a script. If we do, I think we should check this in and I can cherry-pick the necessary features from some of the other merge scripts (we should also ideally consolidate with the script in parquet-format
with maybe convenience wrappers for running both)
It is possible JIRA updates still have some work here, as it errored out while trying to do those.
Make sure you have checked all steps below.
Jira
Tests
Commits
Style
mvn spotless:apply -Pvector-plugins
Documentation