Open dummyunit opened 8 years ago
Hello :)
It would be great to get this feature supported for Git. I think both your options could work (thanks for your detailed proposals btw!).
I would probably lean towards option 1 as follows:
The UnityNode has been updated to use the new reflection system for properties, so it should be pretty easy to add the new options. A minor bit of logic will be required to handle the deprecation of the old properties. The bulk of the work will be interfacing with git.
Thinking about some of the implementation details, here are my thoughts:
Git exe vs lib I would start by invoking the git executable and parsing the output (using the --porcelain option). I think it'll be much faster to get working (since you won't have to integrate an external library into FASTBuild) and there is no chance of bloating the exe (gitlib2 looks pretty sensible, but it really supports a lot of stuff we don't need so it's quite a lot of code). If we keep the git functionality in a helper class, it could always be swapped out in the future if we saw that as beneficial.
Speed With perforce, we get the writable state of the file for free along with normal traversal, so this option has zero cost. With git we'll need to have it return to us the list of files. From what I understand, git does some datestamp based caching, because my local tests of "git status --porcelain" are pretty fast. I think we'll probably want the first UnityNode to do the git query and cache it for the rest of the build.
Deprecated Properties We could create a new MetaDeprecated() object to use in the reflection system, so you could change the old property like this:
REFLECT( m_IsolateWritableFiles, "UnityInputIsolateWritableFiles", MetaOptional() + MetaDeprecated("UnityInputIsolationMethod"))
so we could change Function::PopulateProperty to emit the deprecation message in a generic way.
Let me know what you think :)
Hello and sorry for slow response, I was thinking about how to better implement git submodule support.
So, my thoughts about implementation.
This is easy, either run git status --porcelain
or git ls-files --modified --others --exclude-standard
and parse the output. I think git ls-files
would be better because it doesn't print deleted files and doesn't print directories, which we don't care about, it also runs slightly faster (on my linux machine it is 0.004s vs 0.014s on almost empty test repo). Also git ls-files
must be run from root directory git repo because it shows only modified/new file from current directory, so we will need to find root directory, but this is not a problem because finding it is required for git submodule support anyway.
Neither git status
nor git ls-files
can show changes in all submodules at once. Fortunately submodules can be traversed using git submodule foreach --recursive
, but git submodule
is implemented in shell script, so there will be a significant speed cost. But we can check if .gitmodules
file is present at all in git repo root directory before running this command.
I found 2 ways to collect list of changed files for submodules, each with slight disadvantage against the other:
A. git submodule foreach --recursive git ls-files --modified --others --exclude-standard
This generates output like this (here submodule s_b
is nested into submodule s_m
):
Entering 's_m'
g3
g1
g2
Entering 's_m/s_b'
h3
h1
Problem with this is that line starting with "Entering" is localized, so we need to add LC_MESSAGES=C
to environment for this command and hope that unlocalized version of text will never change.
B. git submodule foreach --recursive --quiet 'echo "$toplevel/$path"; git ls-files --modified --others --exclude-standard'
This instead of "Entering" prints full path to submodule directory like this:
/tmp/top/s_m
g3
g1
g2
/tmp/top/s_m/s_b
h3
h1
Problem with this is that (as I suspect) on windows it will print paths like this: /c/dir
instead of c:\dir
, so additional conversion will be needed.
As for the speed, both A an B are running for about 0.120s on the same test repo which has 2 nested submodules and about 10 files total. I think approach B is better, what do you think?
I am thinking about using common interface for VCS helpers like this:
class IVCSHelper
{
public:
virtual bool IsFileBeingEdited(const FileIO::FileInfo& file) = 0;
};
Perforce subclass would simply return !file.IsReadOnly()
. Git subclass would query git on first call and cache the results.
And about changes to properties I agree with you.
I can see a forthcoming problem with this feature for our particular use case:
We have a modular architecture to our whole engine, with each module stored in its own git repo. So that git becomes like our package manager. To get a module/package, you just pull down its git repo and it gets integrated into the build.
The problem here is lets say you have 100 modules. Now you have 100 git repos to ask the status of before each build. If they have submodules, it gets worse.
If you have 1,000 modules, or 10,000 modules, we can almost guarantee that the cost of these queries will become extremely prohibitive.
I'm certainly not arguing against this feature in-general. I think it's needed, a positive addition, and where most users probably have a single git repo, the naive implementation will probably serve the majority quite well.
I wonder if there's a solution that scales any better for the many-git-repo use-case? Can we somehow make a git plugin which actually does push notifications to a FastBuild database which tracks/caches the states of these files?
At least this way, you'd only be doing the slow updates as the files are actually touched/updated/modified. And then when actually building with FastBuild, there would be no queries hitting git/shell/kernel/etc - it could all be queried from an in-memory DB - probably nearly as quickly as the P4 data we already store.
Another option might just be for FastBuild
to have FastBuildWorker
put some selected paths under "OS File Update Watch" - e.g. fbuild.bff could store some configuration for which paths to watch, and FastBuildWorker
could setup file change notification callbacks with the OS for those paths. And then it could update the aforementioned "DB" (which I know doesn't exist yet, but... :) )
This option would be similar to the idea of a git plugin doing push notifications, but would rely on the OS filesystem notification system directly instead.
The benefit of both of these approaches would be to aim for negligible build time impact for this feature by using a "push notification -> DB" model, rather than a "linearly query EVERYTHING every build before the build starts model"
There would of course be some atomicity constraints on querying the "DB" right before the build starts; but, again, I would assume this should be orders of magnitude faster than querying every git repo before every build.
OTOH, this version of the feature is a bit more involved. Mainly the "DB" part. Writing the actual OS notification code is pretty easy (I've written that before for a shader compiler which would recompile any touched shaders immediately). I'd be glad to help with that part -- but I leave the "how would we want to implement the "DB" and "configuration"" questions to @ffulin... :-)
Would these two tools help your use case?: https://github.com/fabioz/mu-repo Can run git status in parallel across multiple repos https://github.com/cmarcusreid/git-status-cache Uses file watchers to cache the git status Only has Windows support currently, but could perhaps be extended?
For complex cases like the one mentioned by @missmah, I think trying to add all that functionality into FASTBuild itself probably isn't the right call, at least not in the short to medium term.
There is a pending PR which could go some way to solving this problem: https://github.com/fastbuild/fastbuild/pull/220. This change allows an explicit list of exclusions to be specified in an external file. This file could be maintained by an external process, such as a tray application that watches files and/or periodically talks to some source control system.
@ffulin assuming that modifying that file doesn't cause FASTBuild to do a full rebuild, I think that's a great start at an architectural gateway.
Then it's easy to write an external process which does file watches and updates the external file. A missing piece, though, would be how to configure the external processes to know which files/folders to watch. That knowledge is contained in the .fdb. So we'd need an API whereby fastbuild could either emit some paths to an established file when they change, or the app could query a FASTBuild API to get answers to those questions.
I wouldn't want to have to maintain the paths by hand in the separate app, ideally.
Hello Franta! Automatic isolation (from unity) of files that are worked on looks like a great feature. Unfortunately, as it is now, it is tied to how Perforce marks those files.
I would like to add support for other VCS (git in particular) for determining if file should be isolated, and I see 2 main ways to do this:
Integrate support into FASTbuild:
Pros:
Cons:
Add plugin support, similarily to .CachePlugin (maybe call it .VCSPlugin ?) Pros:
Cons:
Please let me know which approach (if any) do you like more.