fastbuild / fastbuild

High performance build system for Windows, OSX and Linux. Supporting caching, network distribution and more.
https://fastbuild.org
1.23k stars 378 forks source link

Support for other methods of detecting files that needs to be isolated from unity #93

Open dummyunit opened 8 years ago

dummyunit commented 8 years ago

Hello Franta! Automatic isolation (from unity) of files that are worked on looks like a great feature. Unfortunately, as it is now, it is tied to how Perforce marks those files.

I would like to add support for other VCS (git in particular) for determining if file should be isolated, and I see 2 main ways to do this:

  1. Integrate support into FASTbuild:

    • Add String .UnityInputIsolationMethod to UnityNode, which would determine how isolated files are found out. It would accept values like 'none', 'auto' (automatic method detection), 'perforce' (current behaviour), 'git', etc.
    • Rename .UnityInputIsolateWritableFilesLimit to .UnityInputIsolatedFilesLimit, to clarify the name (but accept old name to maintain backwards compatibility).
    • Deprecate .UnityInputIsolateWritableFiles and treat setting it to true as setting .UnityInputIsolationMethod to 'perforce'.

    Pros:

    • Support for other VCS will work out of the box.

    Cons:

    • Too much changes to FASTbuild code base considering that this isn't a core feature. (Git support for example would require launching git itself or linking to libgit2.)
    • Support for some VCS may require additional configuration which will require additional variables for UnityNode or Settings.
    • There are many VCS and nobody is familiar with all of them, so maintaining support for them in one place would be difficult.
  2. Add plugin support, similarily to .CachePlugin (maybe call it .VCSPlugin ?) Pros:

    • Minimal changes to FASTbuild.
    • Plugin implementation wouldn't be bounded by FASTbuild restrictions (mentioned in #22). For example Mercurial plugin would probably be python module with C interface.

    Cons:

    • Users will need to find and get those plugins manually.

Please let me know which approach (if any) do you like more.

ffulin commented 8 years ago

Hello :)

It would be great to get this feature supported for Git. I think both your options could work (thanks for your detailed proposals btw!).

I would probably lean towards option 1 as follows:

The UnityNode has been updated to use the new reflection system for properties, so it should be pretty easy to add the new options. A minor bit of logic will be required to handle the deprecation of the old properties. The bulk of the work will be interfacing with git.

Thinking about some of the implementation details, here are my thoughts:

Git exe vs lib I would start by invoking the git executable and parsing the output (using the --porcelain option). I think it'll be much faster to get working (since you won't have to integrate an external library into FASTBuild) and there is no chance of bloating the exe (gitlib2 looks pretty sensible, but it really supports a lot of stuff we don't need so it's quite a lot of code). If we keep the git functionality in a helper class, it could always be swapped out in the future if we saw that as beneficial.

Speed With perforce, we get the writable state of the file for free along with normal traversal, so this option has zero cost. With git we'll need to have it return to us the list of files. From what I understand, git does some datestamp based caching, because my local tests of "git status --porcelain" are pretty fast. I think we'll probably want the first UnityNode to do the git query and cache it for the rest of the build.

Deprecated Properties We could create a new MetaDeprecated() object to use in the reflection system, so you could change the old property like this:

REFLECT( m_IsolateWritableFiles,    "UnityInputIsolateWritableFiles", MetaOptional() + MetaDeprecated("UnityInputIsolationMethod"))

so we could change Function::PopulateProperty to emit the deprecation message in a generic way.

Let me know what you think :)

dummyunit commented 8 years ago

Hello and sorry for slow response, I was thinking about how to better implement git submodule support.

So, my thoughts about implementation.

Single git repo support

This is easy, either run git status --porcelain or git ls-files --modified --others --exclude-standard and parse the output. I think git ls-files would be better because it doesn't print deleted files and doesn't print directories, which we don't care about, it also runs slightly faster (on my linux machine it is 0.004s vs 0.014s on almost empty test repo). Also git ls-files must be run from root directory git repo because it shows only modified/new file from current directory, so we will need to find root directory, but this is not a problem because finding it is required for git submodule support anyway.

Submodule support

Neither git status nor git ls-files can show changes in all submodules at once. Fortunately submodules can be traversed using git submodule foreach --recursive, but git submodule is implemented in shell script, so there will be a significant speed cost. But we can check if .gitmodules file is present at all in git repo root directory before running this command. I found 2 ways to collect list of changed files for submodules, each with slight disadvantage against the other:

A. git submodule foreach --recursive git ls-files --modified --others --exclude-standard This generates output like this (here submodule s_b is nested into submodule s_m):

Entering 's_m'
g3
g1
g2
Entering 's_m/s_b'
h3
h1

Problem with this is that line starting with "Entering" is localized, so we need to add LC_MESSAGES=C to environment for this command and hope that unlocalized version of text will never change. B. git submodule foreach --recursive --quiet 'echo "$toplevel/$path"; git ls-files --modified --others --exclude-standard' This instead of "Entering" prints full path to submodule directory like this:

/tmp/top/s_m
g3
g1
g2
/tmp/top/s_m/s_b
h3
h1

Problem with this is that (as I suspect) on windows it will print paths like this: /c/dir instead of c:\dir, so additional conversion will be needed.

As for the speed, both A an B are running for about 0.120s on the same test repo which has 2 nested submodules and about 10 files total. I think approach B is better, what do you think?

Changes to FASTBuild code

I am thinking about using common interface for VCS helpers like this:

class IVCSHelper
{
public:
    virtual bool IsFileBeingEdited(const FileIO::FileInfo& file) = 0;
};

Perforce subclass would simply return !file.IsReadOnly(). Git subclass would query git on first call and cache the results.

And about changes to properties I agree with you.

missmah commented 5 years ago

I can see a forthcoming problem with this feature for our particular use case:

We have a modular architecture to our whole engine, with each module stored in its own git repo. So that git becomes like our package manager. To get a module/package, you just pull down its git repo and it gets integrated into the build.

The problem here is lets say you have 100 modules. Now you have 100 git repos to ask the status of before each build. If they have submodules, it gets worse.

If you have 1,000 modules, or 10,000 modules, we can almost guarantee that the cost of these queries will become extremely prohibitive.

I'm certainly not arguing against this feature in-general. I think it's needed, a positive addition, and where most users probably have a single git repo, the naive implementation will probably serve the majority quite well.

I wonder if there's a solution that scales any better for the many-git-repo use-case? Can we somehow make a git plugin which actually does push notifications to a FastBuild database which tracks/caches the states of these files?

At least this way, you'd only be doing the slow updates as the files are actually touched/updated/modified. And then when actually building with FastBuild, there would be no queries hitting git/shell/kernel/etc - it could all be queried from an in-memory DB - probably nearly as quickly as the P4 data we already store.

Another option might just be for FastBuild to have FastBuildWorker put some selected paths under "OS File Update Watch" - e.g. fbuild.bff could store some configuration for which paths to watch, and FastBuildWorker could setup file change notification callbacks with the OS for those paths. And then it could update the aforementioned "DB" (which I know doesn't exist yet, but... :) )

This option would be similar to the idea of a git plugin doing push notifications, but would rely on the OS filesystem notification system directly instead.

The benefit of both of these approaches would be to aim for negligible build time impact for this feature by using a "push notification -> DB" model, rather than a "linearly query EVERYTHING every build before the build starts model"

There would of course be some atomicity constraints on querying the "DB" right before the build starts; but, again, I would assume this should be orders of magnitude faster than querying every git repo before every build.

OTOH, this version of the feature is a bit more involved. Mainly the "DB" part. Writing the actual OS notification code is pretty easy (I've written that before for a shader compiler which would recompile any touched shaders immediately). I'd be glad to help with that part -- but I leave the "how would we want to implement the "DB" and "configuration"" questions to @ffulin... :-)

bburgin commented 5 years ago

Would these two tools help your use case?: https://github.com/fabioz/mu-repo Can run git status in parallel across multiple repos https://github.com/cmarcusreid/git-status-cache Uses file watchers to cache the git status Only has Windows support currently, but could perhaps be extended?

ffulin commented 5 years ago

For complex cases like the one mentioned by @missmah, I think trying to add all that functionality into FASTBuild itself probably isn't the right call, at least not in the short to medium term.

There is a pending PR which could go some way to solving this problem: https://github.com/fastbuild/fastbuild/pull/220. This change allows an explicit list of exclusions to be specified in an external file. This file could be maintained by an external process, such as a tray application that watches files and/or periodically talks to some source control system.

missmah commented 5 years ago

@ffulin assuming that modifying that file doesn't cause FASTBuild to do a full rebuild, I think that's a great start at an architectural gateway.

Then it's easy to write an external process which does file watches and updates the external file. A missing piece, though, would be how to configure the external processes to know which files/folders to watch. That knowledge is contained in the .fdb. So we'd need an API whereby fastbuild could either emit some paths to an established file when they change, or the app could query a FASTBuild API to get answers to those questions.

I wouldn't want to have to maintain the paths by hand in the separate app, ideally.