charleso / git-cc

Bridge for Git and Clearcase
Other
182 stars 95 forks source link

Merging multiple CC Vobs into a single git repository #19

Open sonshogun opened 12 years ago

sonshogun commented 12 years ago

The script works great for the most part ( had to modify it to take out the language checking going out to Git Bash ), but it would be much better if it could allow for multiple VOB import into a single git repository. At my work, we have a project set up over multiple VOBs including a common and driver VOB, and as we are trying to model repositories as per project, this creates a difficult time trying to wipe out our .gitcc then re-initialize it to a different VOB. Any ideas for how to move forward better? We are looking for a one time import into GIT. ( Ideally without submodules, my least favorite part of GIT )

charleso commented 12 years ago

Hi Sonshogun,

As far I remember (and it's been a while) git-cc does work with multiple VOBs. It's really only interested in a view, which may be made up of multiple VOBs. What do you mean 'wipe out out .gitcc'? If you're talking about the cache file you can disable that by setting:

[core] cache = False

https://github.com/charleso/git-cc/issues/13#issuecomment-4308338

The bigger problem you're going to run into is that git-cc doesn't really import multiple branches. You will end up with a flat 'master' branch which may include changes made on other branches (depending on how you configure it) but not the actual branches themselves.

I made a stab at this in the Java version which, depending on your language preference, may be more interesting. Here is the relevant code:

https://github.com/charleso/gitcc4j/blob/cleartool/ClearGit/src/gitcc/cmd/Rebase.java#L43

I also tried to handle merging from those branches, but from my experience it did a terrible job and I wouldn't recommend enabling it.

Are you using UCM or normal Clearcase out of interest?

I hope this helps. I'm afraid there isn't really an easy way of importing Clearcase history, even at the best of times, but you're doing the right thing. ;-)

Charles

XieDongSheng commented 12 years ago

Hi charleso, I find it also not work if I want to migrate multiple vobs into the one git repository. For example, in clearcase, we have three vobs: /vob/A, /vob/B, /vob/C. At first, I make a snapshot view and check out the code into my local harddisk, the local url is /c/Users/autouser/autouser_cc. Then i create a git folder, /c/Users/autouser/autouser_git, execute the initilize command, I want to make A, B, C can merge into the same git url, so i execute "gitcc init /c/Users/autouser/autouser_cc" and "gitcc rebase", it will meet the following exception: $ gitcc rebase

git ls-files --modified git log -n 1 --pretty=format:%ai master_cc cleartool lsh -fmt %o%m|%Nd|%u|%En|%Vn|%Nc\n -recurse . Traceback (most recent call last): File "c:/git-cc/gitcc", line 48, in main() File "c:/git-cc/gitcc", line 14, in main return invoke(cmd, args) File "c:/git-cc/gitcc", line 38, in invoke cmd.main(_args) File "c:\git-cc\rebase.py", line 40, in main history = getHistory(since) File "c:\git-cc\rebase.py", line 87, in getHistory return cc_exec(lsh) File "c:\git-cc\common.py", line 50, in cc_exec return popen('cleartool', cmd, CC_DIR, *_args) File "c:\git-cc\common.py", line 60, in popen raise Exception((stderr + stdout).decode(ENCODING)) Exception: cleartool: Error: Not an object in a vob: ".".

But, it i just migrate one vob A, (gitcc init /c/Users/autouser/autouser_cc/A) it can works work. So can you help me how to do it if i want to merge A, B, C into the same git repo? Thanks!

charleso commented 12 years ago

Hmm. I could have sworn this worked.

Well whatever "cleartool lshistory' supports. If you can get lshistory to work, then git-cc will work. Other than that, there isn't much I can do, sorry.

Charles

lucianm commented 10 years ago

CC Folks seem to very often like to organize Projects across multiple VOBs, and then typically one of them has some symlinks to others. In rebase.py:110 the lshistory output parsed seem to be checked if it is 'checkinversion' or 'checkindirectory version', but entries like 'mkslinksymbolic' are ignored, it also seems that those do not have too much Information about where the symlink actually points to. But with 'cleartool ls' this could be retrieved, wouldn't there be a possible approach to implement resolving CC symbolic links when parsing the histstory, in order to get through even this evil VOB boundary? Without this feature, I could use the script quite well as long as the project I worked on was in just one VOB, now since I have to work with other colleagues who established the project structure over multiple VOBs, I can only use "gitcc update" which does not transfer any history from CC to git, and is also very slow on large projects. Charles, maybe you can give some advice from an architectural point of view (of gitcc python implementation) for someone to implement something like this?

charleso commented 10 years ago

@lucianm I suspect it's not quite that easy. From memory lshistory doesn't return updates to the symlinked entity, so even if you did add it to the repository, you wouldn't be "in sync" with the target. The best you could do is have two gitcc repositories that respect that symlink and you have to keep them in sync separately. :(

Someone correct me if I'm wrong.

lucianm commented 10 years ago

@charleso thanks for your input. I also thought of this (in fact it's exactly 2 VOBs I'm talking about, one "common" VOB with several symlinks pointing to files but also to directories from another VOB where the low level code is located), but for this I needed the 2 git repositories to share the same root directory, but then, how would the gitcc text file in which the clearcase versions are tracked, be treated by two gitcc configs updating the same file (not at the same time, of course)? Maybe I also have to very carefully filter out the contents belonging to one VOB via .gitignore out of the other git repositories, to avoid any clashes. Might get very tricky.

charleso commented 10 years ago

@lucianm I wouldn't worry too much about the .gitcc file, it's there to fix some problems with edge cases in syncing, it's not needed for the core functionality to work (and you can even disable it). git-cc is basically relying on git to track changes in the content of files (notified by cleartool lshistory).

I'm afraid I'm not quite sure what you mean by the rest of your comment. :(

In the short term might I suggest just checking in some fake/test symlinks and then manually moving/hacking the master_cc and master_ci refs to see how the workflow works. Because I'm assuming the files symlinks themselves don't change, it's only the content. And that's the thing that git-cc will never be able to track (given what lshistory reports).

Just out of curiosity - do you have a migration strategy for this problem if/when you leave Clearcase? What would you do with the symlinks in a git-only world? We used them at my previous-previous job for linking to binary jar artifacts, which we replaced with Ant + Ivy (but Maven would have worked too). Sorry to ask, but I don't think git-cc will be able to help much in this scenario.

charleso commented 10 years ago

but then, how would the gitcc text file in which the clearcase versions are tracked, be treated by two gitcc configs updating the same file (not at the same time, of course)

Basically you can't do that. You would have two gitcc repository that track two vobs/views, and the symlinks would "just happen" to link to whatever version is currently on that specific filesystem.

If gitcc actually worked with symlinks the .gitcc file would contain the "version" of that symlink, which would be different in each repository. The hard/impossible part is fetching the correct content of that file, and keeping it in sync.

(Apologies if I've misunderstood the question/comment).

charleso commented 10 years ago

Actually it's slowly coming back to me. Basically what I'm trying to say is - you will only ever be able to check in the symlink. Period. I've been talking about what happens if you symlink to something outside of the current repository/view, where that view is what's in the same git repository. If you sync your view, which contains multiple vobs, including the symlink and its target then you should be OK.

Which I guess comes down the original bug/issue - does gitcc support multiple vobs? I could have sworn it did, but alas I don't have a Clearcase instance to test against.

lucianm commented 10 years ago

@charleso ok, got your Point about the .gitcc file. So let me explain what I meant. Unfortunately there is no immediate plan to escape from ClearCase with that project, not just now at least (ist beyond my attributions, I only am so much unconfortable with using CC that I try to avoid directly interacting to it as much as possible since I dicovered your script 2 years ago.

So the code of that project is distributed over 2 VOBs because of access rights policies, not everyone even sees the second VOB containing the guts of the code, but only higher level stuff from the first one. The rest of privileged us can see both, with the build system set up in such way that starting from the common (first) VOB we see also the missing parts added via symlinks from the second, everything underneath one root directory, unfortunately quite "interleaved" in a crazy way (like in the root dir, some subdirs are actual direcory elements, some others are symlinks to the other vob). That hopefully explains the fact that if I want to work on a GIT replica of that, I need to resemble the exact same source tree, which means that some of the files and subdirs should only be version-controlled by one git repository, some others by the other git repository, yet having them all together under one root directory. That makes it so tricky.

I guess in a git-only world I would persuade my colleagues to refactor a bit the way library headers are included and just place those low level libs in a separate git repository, but in such a way that it is not a subfolder of any other code. In that way everyone would only need proper include path settings to those in order to build, and those not supposed to see that code would only get to see interface headers and libs. But as it is now, quite messy when looked at with git glasses (and quite "elegant" in the view of my ClearCase apostle colleagues), it's really difficult. In the longer term we will eventually migrate to another more modern corporate VCS and the ClearCase nightmare will end...

I understand that only by information gathered from a single lshistory -recurse call, gitcc won't be able to track changes of the contents of elements symlinks are pointing at. That's why I thought that this problem might be addressed in a different way, like not collecting the history just by a single run of lshistory, but subsequent ones on the targets of the symlinks encountered anywhere under CC_DIR (this could be obtained at the beginning from cleartool ls -long -recurse which will tell what element is a file, what is a directory and what is a symlink), and then merging and filtering out duplicate changesets of subsequent calls to lshistory before parsing. Maybe it does not fit to the current design, maybe gitcc expects that every element in ClearCase must be under the CC_DIR tree, wheras symlinks could also point outside of it, and yet we want to see them at the place where we created symlinks (and replicated under the same GIT_DIR.

charleso commented 10 years ago

Hi @lucianm.

Thanks for the explanation, it makes sense now. Apologies for being a little slow.

Unfortunately there is no immediate plan to escape from ClearCase with that project, not just now at least (ist beyond my attributions

My commiserations. I expected that might be the case but just wanted to check.

So the code of that project is distributed over 2 VOBs

I just want to confirm that if you run gitcc (or lshistory) from a single view and point to the folder "above" both vobs you don't see interleaved history? If that works then my suggestion would be to just add the relative symlinks (first manually and/or fix gitcc to add them for you) and it might work. Or is this why you mentioned the security policy because you can't actually see the vob directly on your machine?

like not collecting the history just by a single run of lshistory, but subsequent ones on the targets of the symlinks

Right I see what you're getting at. Just thinking about it now I can't think of a reason why that wouldn't necessarily work. I never implemented that myself because our symlinks were only binary files and I never wanted to check them in. So, yeah, I guess if you ran multiple lshistories and then interleaved them into the main result the output might be approaching something useful.

If it were me I would manually run through the steps for an example file/folder yourself and confirm that the information you see would be enough to recreate what you want.

Maybe it does not fit to the current design

Design? I think I left that somewhere with the tests. I wouldn't worry about that - if it works and is useful then it sounds good to me. I might be tempted to make this behaviour optional via a configuration just in case it causes any problems.

Sorry I can't be any more help. Let me know how you go with everything.

lucianm commented 10 years ago

Hi @charleso, so after some busy weeks, and eventualy some of vacation and returning back to work, clearcase annoyed again. I was thinking, let's review our discussion on issue #19, solving it would be of so great help. So I spent some time on hacking something together last few days, it seems to work well, it's already commited in my fork and as you can see, I referenced this issue number. I'd still like to give it few more tests tomorrow at work, against that large project we have in clearcase, and which now extends over 3 VOBs already (one main VOB containing symlinks to another one where the actual code is located, and where recently I discovered they put yet other symlinks to a third one). I'll send you then a pull request, if all goes well. In short, I've done the following:

However, I think you should evaluate one thing I had to modify. When preparing the lshistory command (right at the beginning of getHistory), I needed to feed the comand explicitely with '.' or a symlink if one is encountered, in order to be able to subsequently call lshistory. But this means I have commented out preparing the lshistory comand out of the configured "include" directories, which I think typically contain '.' (current directory) anyway. Any impact you can think of?

lucianm commented 10 years ago

Just found out that the subsequent calls to lshistory could use some proper recurring (assuming there are no cycling symlinks). So I will further refactor most of getHistory to be self-reccuring.

charleso commented 10 years ago

When preparing the lshistory command (right at the beginning of getHistory), I needed to feed the comand explicitely with '.' or a symlink if one is encountered, in order to be able to subsequently call lshistory. But this means I have commented out preparing the lshistory comand out of the configured "include" directories, which I think typically contain '.' (current directory) anyway. Any impact you can think of?

Sorry for the delay. No impact - sounds fine. :)

lucianm commented 10 years ago

Alright, so far my implementation worked fine with a small dummy project, on our huge one I managed so far to run gitcc rebase lshistory successfully (it contains 26 distinct symlinks in the tree, spanning 3 VOBs, I know this from the debug output), and only gathering that initial lshistory takes about 30 minutes approximately, if not 30. This is maybe also due to the sorting of the complete history, I added. I guess it's just a matter of seeing afterwards commits ordered chronologically in the git history line. So I have yet to rebase the entire project in order to be able to say, that my contribution is worth merging to the project, I'll send you the pull request by monday, I think.

lucianm commented 10 years ago

One other unrelated minor thing, of more cosmetic nature comes to my mind. Have you thought of making releases with a version maintained in the gitcc script for example, where it can be printed on the first line of any invocation, and perhaps related to the versions, maintain a tiny ChangeLog or HISTORY text file with few word describing the main changes between them?

Not that it would be so important, but maybe nice to be able to see what version of the script is used, especially when there are changes between such versions and one has succeeded to "infect" few coworkers with the "git out of the way, clearcase!"-treatment and they are asking for help...

charleso commented 10 years ago

One other unrelated minor thing, of more cosmetic nature comes to my mind. Have you thought of making releases with a version maintained in the gitcc script for example, where it can be printed on the first line of any invocation, and perhaps related to the versions, maintain a tiny ChangeLog or HISTORY text file with few word describing the main changes between them?

Sounds like a sensible idea. On my team there was only a few people and it was easy to slide over to their desk and debug the problem. Eventually we switched to a gitcc server model and I could fix all the problems in a single place.

and one has succeeded to "infect" few coworkers

Nice work. :)

charleso commented 10 years ago

I'm not sure if I mentioned it or not, but I haven't touched git-cc in 4 years or so. Feel free to add what you like. I wish I could hand the maintenance of the project over to someone else. Or better yet - hopefully everyone migrates away from Clearcase soon. :)

lucianm commented 10 years ago

I'm not sure if I mentioned it or not, but I haven't touched git-cc in 4 years or so. Feel free to add what you like. I wish I could hand the maintenance of the project over to someone else. Or better yet - hopefully everyone migrates away from Clearcase soon. :)

Oh well, that came through in various conversations on issues :)

If you're ok with that, I'll add let's say version 0.1.1 to the console output and a HISTORY file where 0.1.0 will mark the capabilities gitcc has right now, and 0.1.1 adds rebasing with following symlinks over VOB boundaries. All in the same pull request?

charleso commented 10 years ago

If you're ok with that, I'll add let's say version 0.1.1 to the console output and a HISTORY file where 0.1.0 will mark the capabilities gitcc has right now, and 0.1.1 adds rebasing with following symlinks over VOB boundaries. All in the same pull request?

Sounds good.