Open phcerdan opened 3 years ago
@phcerdan is your ITK version up-to-date before running the script?
@thewtex I am not updating VNL or running the update script, I am just browsing the current git history of ITK. https://github.com/InsightSoftwareConsortium/ITK/commit/df92e582d82
@hjmjohnson are you aware of style updates, etc., that may have caused this?
Yes, and a lot of clang-tidy changes as well to VNL core libraries. Yes, there are a lot of minuscule changes to VNL over the past year to fortify the codebase and start the process of modernizing.
To be honest, I don't quite understand how to resolve this issue.
I don't think it's a VNL issue. It must be about options/usage of UpdateFromUpstream.sh
. Same happened with MetaIO here: https://github.com/InsightSoftwareConsortium/ITK/commit/17e744
Is the UpdateFromUpstream.sh
in sync with the upstream script? I don't know how ITK uses it.
I think @bradking updated it a week ago via 41f4951f6b84cf8e24fe7e4fc7e8ec05f65d9f9d.
Modules/ThirdParty/VNL/UpdateFromUpstream.sh
uses the older Utilities/Maintenance/UpdateThirdPartyFromUpstream.sh
script. My update was to Utilities/Maintenance/update-third-party.bash
.
I'll look at converting Modules/ThirdParty/VNL/UpdateFromUpstream.sh
to use the newer script.
I've updated #2324 to switch to the new script.
Thanks @bradking for the improvement of the update script in #2324.
Before your changes, the old UpdateThirdPartyFromUpstream.sh
run by @thewtex didn't either show the pollution reported in this issue. So maybe it was a local issue caused by running the script with outdated sources.
If at some point in the future ITK rewrites its git history, we could solve the pollution generated from older updates of VNL and MetaIO.
I recommend to keep an eye when updating third parties still using the old UpdateThirdPartyFromUpstream.sh
method to check that they don't touch 16.000 files.
A couple of years have passed, and history is still getting polluted when using UpdateFromStream scripts.
Updated history of the same example file shown in my original post: itkEigenAnalysis2DImageFilter.h
. The MetaIO commit touches 16K files (from this PR: https://github.com/InsightSoftwareConsortium/ITK/pull/4116, in particular this commit: https://github.com/InsightSoftwareConsortium/ITK/pull/4116/commits/c42090c681114ddd1e2f090adf14d022ca41787f,
note that the changes are not shown in the PR diff).
I am able to reproduce this when trying to update Eigen to another branch (from branch for/itk-3.4 to for/itk-master).
diff --git a/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh b/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
index e6c37c161bc..8e72f68af75 100755
--- a/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
+++ b/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
@@ -10,7 +10,7 @@ readonly subtree="Modules/ThirdParty/Eigen3/src/itkeigen"
# readonly repo="https://gitlab.com/libeigen/eigen"
# readonly repo="https://gitlab.kitware.com/third-party/eigen.git"
readonly repo="https://gitlab.kitware.com/phcerdan/eigen.git"
-readonly tag="for/itk-3.4"
+readonly tag="for/itk-master"
readonly paths="
Eigen/Cholesky
Eigen/CholmodSupport
Here are the logs after running the UpdateFromStream.sh for Eigen3 after that change.
update_from_upstream_touch16k_files.txt
The update seems successful, but it introduces a commit touching 16K files.
commit bec339ee6f2e809752844a8489761ddfe18271b7 (HEAD -> update_eigen_to_master)
Merge: 217a54a40b7 522968224c9
Author: Pablo Hernandez-Cerdan <pablo.hernandez.cerdan@outlook.com>
Date: Mon Oct 23 00:35:46 2023 +0200
Merge branch 'upstream-Eigen3' into update_eigen_to_master
# By Eigen Upstream
* upstream-Eigen3:
Eigen3 2023-10-22 (9cd4aec6)
commit 522968224c93dfd0f2ead1eb24d882d7948df40c
Author: Eigen Upstream <kwrobot@kitware.com>
Date: Sun Oct 22 01:42:43 2023 +0200
Eigen3 2023-10-22 (9cd4aec6)
Code extracted from:
https://gitlab.kitware.com/phcerdan/eigen.git
at commit 9cd4aec6122b29a37e8de3fbfff47de3af0fc371 (for/itk-master).
See PR #4265 as an example when updating Eigen.
Was this branch rebased at some point? Third party import merge commits cannot just be rebased; I typically remove the commit from the rebase -i
and replace it with exec ThirdParty/import/update.sh
to rerun the import during the rebase.
Note that the robot does have a check to enforce that the merge happens properly and will detect and block any rebases like this. It does not seem the robot is enforcing this for ITK at the moment. I'll caution that it means that all changes to third parties must go through the import script.
The git-check-third-party
script from this repo will check the state of an import. For the Eigen import it reports:
% git check-third-party Modules/ThirdParty/Eigen3/src/itkeigen
fatal: not imported via a merge commit
This import likely needs redone. Once performed, we can start enforcing all such imports.
State of imports:
double-conversion
: local diffseigen
: not imported via a merge commitgdcm
: local diffsgoogletest
: local diffshdf5
: not imported via a merge commitkwsys
: okminc
: local diffsmetaio
: not imported via a separate branchnifti
: local diffsvxl
: not imported via a separate branchzlib-ng
: local diffsWas this branch rebased at some point? Third party import merge commits cannot just be rebased;
Yes, it was rebased.
Before:
- Commit2
- Commit1
- HEAD external eigen (release 3.4)
I updated the HEAD of external eigen to master:
After:
- NewCommit
- HEAD external eigen (master)
I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh to rerun the import during the rebase.
I would like to try this. However, how do you interact with both ITK (or VTK), and the third party module in that way? I see that the update script creates (and then removes) a work/extract
and work/upstream
. Is in that work/upstream
where you perform that rebase?
The git-check-third-party script from this repo will check the state of an import. For the Eigen import it reports: % git check-third-party Modules/ThirdParty/Eigen3/src/itkeigen fatal: not imported via a merge commit This import likely needs redone. Once performed, we can start enforcing all such imports.
It seems that we should enforce that check from now on in ITK.
Is in that work/upstream where you perform that rebase?
The import is not (and should not be) rebased. That is what git rebase
does not understand. Using exec
to rerun the import just redoes the merge using the proper -Xsubtree
option (also not transcribed in the history for things like verification or reuse when reapplying the commit via either cherry-pick
or rebase
).
It seems that we should enforce that check from now on in ITK.
I agree. The first thing to do would be to get the "local diff" instances' diffs upstreamed into the imported repository. These can then be trivially reimported and should be "fine" according the the script.
I would also recommend using (unannotated) tags in the import repos to preserve imported contents "forever". VTK started using for/vtk-YYYYMMDD[.N]-BASE
where BASE
is either the tag or something like master-gXXXXXXX
with XXXXXXX
the shorthash of the base commit and master
the branch name. This allows the for/vtk
branch to be rebased without worrying about "losing" the source for the import due to repository pruning over time.
I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name"
with the files I have modified upstream (.gitattributes, CMakeLists.txt).
I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh
I still don't know how to perform the operation you suggested:
What commit from the ITK history do you remove and replace with the exec update?
Pinging @thewtex
What commit from the ITK history do you remove and replace with the exec update?
What is in master
is there. I'm talking about when you use git rebase -i
and see a bunch of pick
commands. I remove the commit related to the import and replace it with exec path/to/update.sh
.
I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name" with the files I have modified upstream (.gitattributes, CMakeLists.txt).
Chuck added a mode where it detects an empty directory and redoes the import.
The update seems successful, but it introduces a commit touching 16K files.
We are working with branches that have different histories and subtrees -- the logs may be misleading on how many files are actually being touched.
See PR #4265 as an example when updating Eigen.
The update here looks good at a high level.
I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name" with the files I have modified upstream (.gitattributes, CMakeLists.txt).
If both upstream and ITK edited the files, then a conflict resolution may be necessary.
the logs may be misleading on how many files are actually being touched.
A rebase put the imported tree into the main ITK history without an -Xsubtree
option. The diff of the commit itself is bogus.
If both upstream and ITK edited the files, then a conflict resolution may be necessary.
This ended up being an endless source of extra work when rebasing the import repository. This is why it is preferred that all changes to the import come through update.sh
and never through manual patching.
Was this branch rebased at some point? Third party import merge commits cannot just be rebased; I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh to rerun the import during the rebase.
I understand now. I thought you were referring about the third-party project being rebased. So, no, I am not rebasing inside the ITK master branch. But the third-party project was rebased. Good trick to know though! I tried it for this current situation and get the same result (as it should be, because I am not rebasing ITK history).
The update here looks good at a high level.
It looks fine in the PR, but the first of the two commits is touching all the ITK files.
The diff of the commit itself is bogus.
It might be, but it pollutes the git history of all ITK files, which I think is extremely annoying.
So, at the end I got it solved (I think!).
As background, Eigen used to be stored in bitbucket, and we were using an non-official mirror for git. However, Eigen officially moved to gitlab,, the mirror became obsolete, and all the git hashes changed, like a history rewrite.
That's why I was getting these merge problems, the for/itk-3.4 and for/itk-master have unrelated histories.
So, I locally removed rm -r ./Modules/ThirdParty/Eigen3/src/itkeigen
the subtree. Modified https://github.com/phcerdan/ITK/blob/f1c48dc682fbab83563a6455ee548860fda3790f/Utilities/Maintenance/update-third-party.bash#L225
git read-tree -u --prefix="$subtree/" "upstream-$name"
for git read-tree -u --prefix="$subtree-delme/" "upstream-$name"
And commented out the last two commands:
git read-tree -u --prefix="$subtree-delme/" "upstream-$name"
fi
# git commit --no-edit
# git branch -d "upstream-$name"
Then move: mv ./Modules/ThirdParty/Eigen3/src/itkeigen-delme ./Modules/ThirdParty/Eigen3/src/itkeigen
git restore Utilities/Maintenance/update-third-party.bash
git add -A
git commit --no-edit
git branch -d upstream-Eigen3
These changes result in only the itkigen subtree files being touched, instead of all ITK files.
For example the last commit updating VNL: https://github.com/InsightSoftwareConsortium/ITK/commit/df92e582d82
12 commits in VNL provoke almost 3 million deletions everywhere in ITK It's the update from upstream script having some hiccups? Is it expected?
I have noticed it when grabbing the history of a file, it always has these VNL and sometimes MetaIO massive patches on it, that do not seem related to the file.
For example. history of
itkEigenAnalysis2DImageFilter.h
:Before 2016-03-02, it wasn't happening.
@blowekamp @hjmjohnson @thewtex