dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

Is it possible to allow remote projectreference from git repo? #6132

Open Thaina opened 3 years ago

Thaina commented 3 years ago

In addition to building project and publish into package, sometimes it was more convenient to let the project reference another project directly and compile it along with the main project

But current method of projectreference require project to present in machine and reference it only locally. Which is inconvenient and require setup on each machine

So I think msbuild should allow referencing git repo in csproj. Internally you could pull git project into temp folder and do the same process as projectreference. But it should be standardize and allow us to build project with CI/CD by projectreference

<Project Sdk="Microsoft.NET.Sdk.BlazorWebAssembly" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <ItemGroup>
    <ProjectReference
      Include="src/Blazor.Extensions.Canvas/Blazor.Extensions.Canvas.csproj"
      Repo="https://github.com/BlazorExtensions/Canvas.git"      <!-- the source repo -->
      Commit="3d9b5e6eccb0a66d34172f07ceeb8b7f4d82aaec"      <!-- commit ID for versioning -->
    />
  </ItemGroup>
</Project>

Is it possible?

Forgind commented 3 years ago

It's a nice idea, but I don't think this would work well for any but the most basic projects.

Specifically, if the only part of the git repo you want is src/Blazor.Extensions.Canvas/Blazor.Extensions.Canvas.csproj, then it would be quick to pull down just that and continue your build—but you could also do that beforehand or in a separate build step—there's probably a way to do it with the DownloadFile task.

In the more normal case, you'd want a csproj plus everything it depends on, which is unknown before attempting to download it, so we would have to eagerly download the whole repo, which would take a lot of time. If you do want to go this route, it might be reasonable to clone the repo locally and just update it (git pull) as part of each build. Still slower than if you had it locally, but better.

Does that sound reasonable to you?

Thaina commented 3 years ago

@Forgind I wish that we could have the process you describe be standard automatic process of msbuild along with reference package from nuget

As I said, I wish we could let the msbuild clone project into temp folder of local machine. And by temp folder I means it could cache and share between project, only delete when space is required, and so it kept the shallow copy of specific version for rebuild, the same way we kept download package from nuget in local machine

And the main point is, it then allow me to share my project to other machine, and do CI/CD job on the CI/CD server like github or bitbucket, with just an element of projectreference, in the same manner as nuget denpendency

Forgind commented 3 years ago

So to see if I understand correctly, you're proposing an MSBuild command that downloads just one file from a GitHub repo and puts it in the %temp% folder (in a folder like %TEMP%\Canvas\Blazor.Extensions.Canvas.csproj or %TEMP%\Canvas\src\Blazor.Extensions.Canvas\Blazor.Extensions.Canvas.csproj or on its own?) and save it there between builds, only deleting it if the computer is running out of memory?

One extra problem to consider is when we should assume it's out-of-date. If we aren't maintaining the full repo between builds, we'd essentially have to download it every time if we wanted to ensure that it's still up-to-date (or at least download a timestamp for it) and at that point, we may as well be using the DownloadFile task.

Thaina commented 3 years ago

@Forgind The repo should be considered outdated manually, like a specific version of nuget package, the commit ID is like the version itself. I consider this as constraint for this method of referencing, unless we have an ability to specified semver query with git commit

But I also have something in mind, maybe we could specified Commit attribute as {anybranch}/HEAD or {anybranch}/{anytag} and that would request to check remote server on every build and pull every change, or maybe on dotnet restore that will update repo to that version

downloads just one file

Well, maybe not. I think we must load the shallow copy of the repo, which is all of the files in that commit. But in that repo it could contain any folder structure so we also need to specified where and which csproj we want to include Then again, if we have this kind of feature in msbuild. We then have ability to create repo and project separate from each other as one repo per project. And reference all dependency by its repo directly

or on its own?

I think we should download into %TEMP%/msbuildreference/github.com/BlazorExtensions/Canvas/{Commit}. It would be like local centralized mirror that every project will be reused the same project of the same commit

Forgind commented 3 years ago

Using commit ID as a version could work as far as telling the build system when it's out-of-date, but it's clunky. Also, unless I'm following the repo closely, I presumably wouldn't know I should change the commit sha, nor would I know what to change it to. At that point, is it easier to have a separate task or just to run git checkout <sha> from the relevant repo whenever you need to?

I'm starting to worry about the security aspects of this. If I specify that I want whatever code happens to be in, say, dotnet/msbuild:master, and the owner of msbuild:master is malicious, that owner could put whatever code they wanted there, and you would automatically download it and run it even if you don't change your code at all. Looking at specific commits sounds safer to me.

Having all the files in a commit doesn't necessarily mean you have all the files the files you need rely on. Like I can update Microsoft.Common.CurrentVersion.targets without touching any of the tasks it relies on, which would mean I would just be relying on the previous versions. Pulling in just the commit would miss that.

I do like the \Canvas{Commit} plan as far as preventing wrong version-type errors, but it would also make invalidating (and deleting) pseudo-repos hard. Git has an incredible branching structure so it only has to remember diffs when switching between commits. If we were to have a separate folder for each commit we asked for, that could be several almost-identical versions of the same repo side-by-side, which would waste a lot of memory. Deleting them is made difficult because they're shared—you'd have to verify that no project references a particular commit before you could delete its folder. Otherwise, you'd risk downloading the same commit of the same repo every time you switched what you were working on.

Thaina commented 3 years ago

Using commit ID as a version could work as far as telling the build system when it's out-of-date, but it's clunky. Also, unless I'm following the repo closely, I presumably wouldn't know I should change the commit sha, nor would I know what to change it to. At that point, is it easier to have a separate task or just to run git checkout from the relevant repo whenever you need to?

I'm starting to worry about the security aspects of this. If I specify that I want whatever code happens to be in, say, dotnet/msbuild:master, and the owner of msbuild:master is malicious, that owner could put whatever code they wanted there, and you would automatically download it and run it even if you don't change your code at all. Looking at specific commits sounds safer to me.

This two problem is the same as current nuget anyway isn't it? When you just specified strong version number. It then only pull that version even there is an update When you just specified semver. It then allow project to update to newest version conveniently, but it also allow repo owner to inject malicious code into their package and publish malicious new version that would be included in your project on build Using commit ID or tag name is exactly the same tradeoff, at your own risk and trust

Having all the files in a commit doesn't necessarily mean you have all the files the files you need rely on. Like I can update Microsoft.Common.CurrentVersion.targets without touching any of the tasks it relies on, which would mean I would just be relying on the previous versions. Pulling in just the commit would miss that.

You then couldn't use that repo as ProjectReference. You might fork it into another repo of your own and cut the reliance on previous version. I think shallow copy have more size advantage for building on CI/CD and should be main priority than very specific project structure like that

Actually I am confused, did you think that shallow clone will only download a file that changed in one commit? No, it download the whole repo, every file and folder as you see in github. It just not download the whole history of that repo like normal clone

it would also make invalidating (and deleting) pseudo-repos hard

We should just use cache clean that clear all cached repo. We might have cache clean unused to list all project in every subfolders of current folder and let that function determine what repo that don't require anymore

Deleting them is made difficult

Nope, I just think caching and sharing is for convenient in daily or hourly development. but it not that important. We can clean it and redownload it in the same manner as using nuget. Clean all the cache and redownload it 2-3 times a week shouldn't hurt. Git repo should be strong enough to accept recloning eventually

AraHaan commented 3 years ago

In addition to building project and publish into package, sometimes it was more convenient to let the project reference another project directly and compile it along with the main project

But current method of projectreference require project to present in machine and reference it only locally. Which is inconvenient and require setup on each machine

So I think msbuild should allow referencing git repo in csproj. Internally you could pull git project into temp folder and do the same process as projectreference. But it should be standardize and allow us to build project with CI/CD by projectreference

<Project Sdk="Microsoft.NET.Sdk.BlazorWebAssembly" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <ItemGroup>
    <ProjectReference
      Include="src/Blazor.Extensions.Canvas/Blazor.Extensions.Canvas.csproj"
      Repo="https://github.com/BlazorExtensions/Canvas.git"      <!-- the source repo -->
      Commit="3d9b5e6eccb0a66d34172f07ceeb8b7f4d82aaec"      <!-- commit ID for versioning -->
    />
  </ItemGroup>
</Project>

Is it possible?

I would prefer to specify branch instead of commit hash as commit hashes always change.

Thaina commented 3 years ago

@AraHaan commit hashes is for specific version that you want to target it permanently. It it a history record that will not changed unless the owner decide to force delete it

AraHaan commented 3 years ago

@AraHaan commit hashes is for specific version that you want to target it permanently. It it a history record that will not changed unless the owner decide to force delete it

Not everyone wants to fix it to commit hashes however like me (I might pull in and have an AI update an zlib submodule for this very thing), infact why not JUST use submodules instead and then on the CI before it clones your repository to build it would have to set to clone recursively by default (which would also clone the submodule) and avoid having to ask for a feature like this. Even I locally pull the submodule as well before building too.

Even in git having subprojects is a normal thing so that was why they invented submodules to begin with.

Now it could be possible to make an msbuild task that looks inside of the .gitmodules file to look for submodules then run git submodule update to clone them if they are not already cloned (or update them if the .gitmodules targets a specific branch).

Thaina commented 3 years ago

@AraHaan The branch is on my mind, not that it wouldn't possible, it just that hash would be default and safest way, there would be no chance of breaking change when pull from specific commit

Think about this, you might target master/head and do some developing for hours. But when you decide to build it then restore another version, which the owner was also change many of his API, and your project require change again

That's why I think specific commit version should be default way, like nuget reference

submodules

It was a complicate setup to have both submodule and then reference from that submodule. If every of your project use the same submodule then it also redundant

AraHaan commented 3 years ago

@AraHaan The branch is on my mind, not that it wouldn't possible, it just that hash would be default and safest way, there would be no chance of breaking change when pull from specific commit

Think about this, you might target master/head and do some developing for hours. But when you decide to build it then restore another version, which the owner was also change many of his API, and your project require change again

That's why I think specific commit version should be default way, like nuget reference

submodules

It was a complicate setup to have both submodule and then reference from that submodule. If every of your project use the same submodule then it also redundant

I would make commit optimal however (for if you use projects that you do not maintain) with projectreference. Tbh my rule on my code is if I need to depend on things not in my code (Like System.Text.Json for example) that I install the nuget package instead and only reserve ProjectReferences to only code I own and that I would want to ship with my metapackage.

Thaina commented 3 years ago

@AraHaan The main point of this feature is that you might start reference anyone's repo in github even today, not just only of your own. And sure, you can craft the tag and branch specifically for conveniently use this reference for your own project. But I think most reference would be from other people that don't have to make their repo structure for this reference specifically

That might be, eventually, but for all legacy repo we have now, it might not

AraHaan commented 3 years ago

Even legacy repositories support submodules, I feel like this feature is more along the lines of "I got a submodule but I am too lazy to register it in git as a submodule" or "I do not know how to register it as a submodule and I do not want to read git's docs on it".

Besides submodule updates are optional inside of git anyway, you do NOT have to run git submodule update to pull the latest changes to them if you do not want to, you could git submodule init to have it clone and checkout the submodule commit that the repository points too (unless that commit does not exist no more), or clone recursively to that commit.

Besides that is what makers of git recommends for any programming language projects, to use submodules.

AraHaan commented 3 years ago

But ye I think and msbuild task for initing submodules (if anything in .gitmodules is found, or if .gitmodules is found and if they have not been initialized (cloned) yet), then a property that controls if another task gets run (that runs git submodule update) only if they enable it.

This is because I still think submodules should be the way to go that would benefit projects of ALL sizes (yes even the .NET runtime repository as then they can break up the runtime projects into submodules inside of it however it would mean a lot more repositories would need to be maintained which is probably a no to more work for the .NET Team).

Thaina commented 3 years ago

Even legacy repositories support submodules

My argument was separated

Submodules is great and support legacy repo but it not easy to setup and also it made redundant when many project reference the same submodule. So we should have this feature in msbuild, This is one story

Argument about legacy repo is response for your argument about using commit ID and why this feature should be made to support legacy repo, This is another story

AraHaan commented 3 years ago

Even legacy repositories support submodules

My argument was separated

Submodules is great and support legacy repo but it not easy to setup and also it made redundant when many project reference the same submodule. So we should have this feature in msbuild, This is one story

Argument about legacy repo is response for your argument about using commit ID and why this feature should be made to support legacy repo, This is another story

Submodules are actually easy when the documentations for them (shown when you git submodule -h) are read thoroughly.

Although if it is hard, maybe someone like me that knows how to make submodules could suggest adding an dotnet new submodule <repository url here> Which then adds a repository as a submodule and clones it.

Thaina commented 3 years ago

@AraHaan

thoroughly

That's the point

documentations of ProjectReference is easier

AraHaan commented 3 years ago

@AraHaan

thoroughly

That's the point

documentations of ProjectReference is easier

git submodule add {main-repo-url} {path to place the submodule at in the repository} (from here)

I guess aliasing this using dotnet new could be done (but force the command to be run in the dir they want the submodule to be added in from their repository).

Thaina commented 3 years ago

@AraHaan First, you need to learn about submodule

Next, you need to learn where the submodule is

Then you need to put ProjectReference from the path of submodule

You then need to maintain submodule in your repo and maintain reference in your project separately. If you don't need it anymore you need to uninstall reference than remove submodule

And so on and so on and so on

While PackageReference (and also my requesting feature) has all complication out of our sight, don't even have unneccessary submodule folder in our repo. PackageReference do anything behind our back with nuget server. Add or remove reference is as easy as one line of xml that everybody using C# can do, even without knowing anything about git

While all of your explanation need to be tediously doing in the same pattern again and again and again for every project

AraHaan commented 3 years ago

It's easy to manually remove an git submodule, simply delete the dir and it's entry inside of .gitmodules.

Infact maybe dotnet could have a command for that too, in this case dotnet submodule remove <submodule name>, where it looks for it's entry inside of .gitmodules, then delete it, followed by the directory of the submodule and then stages those 2 changes.

And that command could easily be an tool that anyone could make. All it would do is process the text inside of .gitmodules, obtain the directory of the submodule, then recurisively delete every file and folder inside of it, and then finally the submodule directory itself.

Thaina commented 3 years ago

@AraHaan all it would do of you is seriously all a lot complicate than just PackageReference and people need to learn all that new things

It just easy for you. Not easy for everyone at all. Please understand this

Just answer for yourself. All of your explanation so far, is it has anything easier than one line of PackageReference ?

If not, done, stop

TheButlah commented 2 years ago

Hello, I'd like to second this feature request. Adding git dependencies pointing to a particular tag, branch, or commit, is a pretty common feature of package managers. This is especially helpful to avoid needing to self-host a nuget repository.

The arguments for preferring git dependencies over managing submodules in a codebase are very similar to the arguments of nuget dependencies vs vendoring submodules. Ultimately, supporting explicit git dependencies in msbuild lets us rely on the build system rather than vendoring dependencies in git itself. In some prior teams I've worked in, many developers were very inexperienced with git, and it was a constant pain for them to deal with submodules. Ideally the preference of msbuild git dependency vs vendoring submodules would be up to the user/team.

Some good prior art for this

Nirmal4G commented 2 years ago

@jeffkl This could be a feature for MSBuild SDKs. Arcade supports this by having its own dependency management. But not everyone can use Arcade. This could provide a MSBuild based replacement.

fakhrulhilal commented 2 years ago

I'd be helpful if we can do this. It's something that other stack can do it nowadays. See another stack

AraHaan commented 2 years ago

I think a better option would be to consider this:

* When the referenced project is built, it will copy the auto built external projectreference. Also with this it's possible for build to take forever because of it needing to clone large projects, also consider referenced projects that reference other projects similarly where there could be multiple nesting levels. How would this solve issues where one could run into MAX_PATH problems however?

While this could be good and possible be used to replace what arcade similarly does, I feel that there will need to be more planning done to look into ways to avoid issues (esp on systems where there is no way to bypass MAX_PATH to make it unlimited or when the user does not have the bypass opt-in enabled on their system).

Alternatively:

Benefits on this:

Con:

Thaina commented 2 years ago

@AraHaan

github actions only limit to github. While pulling git repo with common git command is available for every git repo stored anywhere

I want to add that. In unity there could be specified specific path to target folder and only use that to copy specific source folder

Also it might be better to specified branch and tag as optional parameter. Maybe instead of CommitHash it should be Target and TargetType as Hash / Tag / Branch