dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Merge core types in the core assembly/packages #4517

Closed galvesribeiro closed 4 years ago

galvesribeiro commented 9 years ago

Hello every1,

After a lot of discussion on the dotnet/coreclr gitter, we decided to open an issue to track the idea here.

We noticed that System.Collections.Generics is on its own assembly... At first it looks(and for many people) really cool, modularity etc., but looking from another point of view, for some core assemblies, its totally unecessary, and at some cases like collections, it makes no sense at all.

We are pretty sure that there are other cases that if we dig further on coreclr, we might find it, but for now, System.Collections.Generics is the one we're aware of.

System.Collections.Generics is a very basic/core namespace/types, and previously, it was just a namespace inside mscorlib, so why create a different assembly for that on coreclr? I understand that there is not a "mscorlib" on coreclr anymore, and the https://www.nuget.org/packages/Microsoft.NETCore is just a "metapackage" that downloads all the other nugets (including System.Collections.Generics one) and it is the "new mscorlib" on coreclr.

Modularity is something really nice, and we are all for it, but we have to have caution when design stuff otherwise, we will be end up with some semantics of a single-package-per-class-ish.

So, what can we do in order to join/merge those core assemblies on a core BCL? Is there any real reason behind this decision to split those core assemblies instead of keep it all together on small BCL?

I don't believe it is a hard work to merge it, and it shouldn't impact the current implementation/design so I'm confident that the community can do it without impact other deliverables like asp.net 5 for example.

stephentoub commented 9 years ago

cc: @terrajobst, @weshaggard, @ericstj

masonwheeler commented 9 years ago

Agreed. Modularity is important, but in any non-trivial program, stuff like List<int> and Dictionary<string, T> are just as fundamental as int and string. CoreCLR's design should reflect that reality.

weshaggard commented 9 years ago

Why does it matter where the implementation of a given contract like System.Collection lives? Whether or not it lives in mscorlib or System.Collection.dll to developers it shouldn't really matter as long as you can use the APIs.

We are trying very hard to correctly layer our libraries and separate them into assemblies that represent a common set of functionality. That allows those assemblies to independently evolve without being tied to other things (ex: mscorlib) and also allows us to use the same code, and binary in the ideal case, on our different .NET platforms/runtimes. For example .NET Native does not have an mscorlib and the runtime is not coreclr but we can and do use this exact System.Collections.dll assembly there.

Modularity is something really nice, and we are all for it, but we have to have caution when design stuff otherwise, we will be end up with some semantics of a single-package-per-class-ish.

I completely agree we have to be careful about not ending up with a single-package-per-class, and we do try to be careful but there are some cases that end up warranting it if they are unique enough.

In the collections case I think we have actually put a good set of collections into the contract and I feel it is correctly factored from the API surface. People could argue that it shouldn't be in it's own implementation assembly but I would argue that is an implementation detail that most developers should care about, and we always have opportunities to optimize by merging these assemblies at compile-time/runtime, like .NET Native does.

So in the collections case I do not think we should merge them into mscorlib, we should actually try to remove the copies of some of them that live in mscorlib instead.

masonwheeler commented 9 years ago

So in the collections case I do not think we should merge them into mscorlib, we should actually try to remove the copies of some of them that live in mscorlib instead.

I disagree. Mscorlib is, as its name implies, for core functionality. Show me a non-trivial program that does not use generic collections for anything, and I'll show you a program written by a developer who's never learned about generics. Especially with so much new functionality being implemented on top of the LINQ paradigm, it should be recognized that IEnumerable<T>, List<T>, Dictionary<TKey, TValue> and HashSet<T> are every bit as "core" to modern CLR programming as int, string and object.

Heck, I'm even aware of at least two CLR languages that support IEnumerable<T> at the syntax level! If the implicit assumption that "IEnumerable<T> is always available" were violated, it would make a bit of a mess for them...

weshaggard commented 9 years ago

I don't disagree with your statements that these collections are "core" types but not all "core" types should be in mscorlib. Our entire repo is about "core" types and not all of them should be in the "corelib".

As for language features there are a ton of them, LINQ included, that don't have the backing library support in mscorlib. They are opt-in and require additional library references to be supported.

galvesribeiro commented 9 years ago

@weshaggard I agree with you that you guys are doing a wonderful work on designing the class library today. Our point is, there is no real application that don't deal with collections today, specially generic ones and as @masonwheeler mentioned, with the advent of Linq/lambda and more and more C# language is evolving it, we deeply need collections. To start, linq depends on IEnumerable for pure simple queries/projections, so, how can we have linq as a core language feature and don't have the collections in the core?

I agree that the repo IS the core of everything and as is, we could threat all code here to be on the "core library". My point is, even in the core, there are parts that a user doesn't need on their projects. For example, if I don't need deal with Tasks, I don't need System.Threading.Tasks assembly. This is one of the many cases where I would create a new assembly for it but, collections? I don't see any real application, from a notepad, to a web site, to a complex distributed system, where we wouldn't use collections.

So, why are we discussing all this? 'Cause add modularity, gives people power to choose only the packages they need but in other hand, ask for this users to take care of versioning of each one of those packages and remember which one depends on another and on this case of the collections, I don't see big changes so soon and if it is needed, well, the core assembly can be updated... We are using nuget, so deal with those updates is easy. We just don't need to add several nuget packages and manage all them if it isn't necessary.

masonwheeler commented 9 years ago

I suppose the main principle of our argument is that modularity is indeed desirable, but it should be done pragmatically rather than dogmatically. If a certain feature (such as generic collections) is going to be used in every non-trivial program anyway, what is the benefit in modularizing it, when the entire benefit of modularity is the ability to exclude things you don't use?

mikedn commented 9 years ago

I don't see any real application, from a notepad, to a web site, to a complex distributed system, where we wouldn't use collections.

It's certainly possible to create an application that uses its own collection library instead of using the "standard" ones. The standard collections excel neither at performance nor at functionality.

galvesribeiro commented 9 years ago

@mikedn well, I agree with you that people can build their own collection library for specific purposes(I do it myself) but I doubt that they don't use simple basic List<T> or a Dictionary<K,V> somewhere else, or if those special collections are not based on one of the base collections built-in.

masonwheeler commented 9 years ago

@mikedn Then please, by all means, submit a PR to improve them. The fact that they are standard collections means that improvements to them will improve tons of software for free.

In the past, this wasn't so feasible because the standard libraries were all proprietary, but now that they've been open-sourced, there's no good reason not to truly standardize on them and work to improve conditions for everyone.

mikedn commented 9 years ago

but I doubt that they don't use simple basic List or a Dictionary<K,V> somewhere else

I've done just that on some occasions. Of course, the fact that the standard collection would be in mscorlib doesn't really change that as I could (and I did) use different namespaces and/or class names. But the point is that they are not core types, not in the same sense as String, Int32 etc.

Then please, by all means, submit a PR to improve them.

I very much doubt that a PR that, say, removes enumerator versioning from the standard collections will be accepted.

masonwheeler commented 9 years ago

@mikedn OK, for the benefit of those of us not particularly well-versed in the intricacies of collection optimization, why would that be an improvement?

weshaggard commented 9 years ago

From a factoring standpoint even if we did put the implementation of System.Collections into mscorlib people would still have to reference the System.Collections package for compiling because people are not supposed to compile directly against mscorlib as it is an implementation detail (a detail that doesn't even exist on .NET Native). We want to be in a position where most people do not need to care about where or how things are implemented all they do is reference the package and it provides an assembly for compiling and for deploying. So whether that lives in mscorlib or System.Collections.dll it shouldn't matter to developers as that is handled by our packages.

As for dealing with individual packages it is our expectation that most developers will reference a "meta-package" that contains the full set of core libraries that allows them to use the functionality, so they will not need to care about individual package versions either. You can think of a "meta-package" as a traditional targeting pack that is a set of packages that represent a version of .NET Core.

If a certain feature (such as generic collections) is going to be used in every non-trivial program anyway, what is the benefit in modularizing it, when the entire benefit of modularity is the ability to exclude things you don't use?

Exclusion is not the only benefit from modularity. It is also about sharing the code/library between different frameworks as I hinted before. The open-source mscorlib is not the only corlib that can use this standalone collection library, it is only one of the places. So one of the benefits we get here is having one location for the source to build and use in other platforms that doesn't use mscorlib (or at least the open-source mscorlib in the coreclr repo). If we did put it in mscorlib for example we would have to duplicate that source code in any other corlibs and keep them all in sync with updates to maintain a common .NET ecosystem across those different runtimes. In my ideal world mscorlib would only have things in it that are required to have hooks into the coreclr\VM.

mikedn commented 9 years ago

@masonwheeler Smaller collection and enumerator objects. Less code generated for methods that are supposed to be trivial such as indexers and enumerator's MoveNext.

Another example is the _syncRoot field that some collections like List<T> have. It's very much useless today but it must be kept due to the fact that the non-generic ICollection interface needs it.

And the list can go on...

stephentoub commented 9 years ago

Another example is the _syncRoot field that some collections like List have. It's very much useless today but it must be kept due to the fact that the non-generic ICollection interface needs it.

There are potentially solutions to address such things. For example, we could consider maintaining a static ConditionalWeakTable<object, object> used to link a sync root object to its associated collection. It would make getting that object more expensive, as a tradeoff for making the size of the collection object smaller.

masonwheeler commented 9 years ago

@weshaggard So where does the basic runtime functionality live in .NET Native?

weshaggard commented 9 years ago

For the primitive types like int, string, etc they live in another core library (System.Private.CoreLib) that is not mscorlib and they are bound to another runtime that isn't CoreCLR. The collections are in the library that we have in the standalone System.Collection library in the corefx repo.

jakesays-old commented 9 years ago

One of my concerns is you guys are making this thing so complicated that it will only be usable via high level tooling. You say most devs will reference a meta assembly - what configures that meta assembly? How are its constituent assemblies determined?

And more importantly, how as a developer will I be able to control all of that?

weshaggard commented 9 years ago

@jakesays that is a fair concern and it is one we are have as well. We are definitely trying to figure out how to make this easier. As an example have you seen a UWP application in VS 2015? To get an idea of what we are talking about lets look at the project.json for a new blank project.

{
  "dependencies": {
    "Microsoft.ApplicationInsights": "1.0.0",
    "Microsoft.ApplicationInsights.PersistenceChannel": "1.0.0",
    "Microsoft.ApplicationInsights.WindowsApps": "1.0.0",
    "Microsoft.NETCore.UniversalWindowsPlatform": "5.0.0"
  },
  "frameworks": {
    "uap10.0": {}
  },
  "runtimes": {
    "win10-arm": {},
    "win10-arm-aot": {},
    "win10-x86": {},
    "win10-x86-aot": {},
    "win10-x64": {},
    "win10-x64-aot": {}
  }
}

If you look at that you will see a few things:

  1. Some individual packages, like ApplicationsInsights or any other individual package you might want to reference.
  2. The meta-package reference "Microsoft.NETCore.UniversalWindowsPlatform" 5.0.0, which includes all the traditional BCL library packages. This is analogous to setting TargetFrameworkVersion=v4.5 in your project file for full .NET framework projects.
  3. The frameworks/runtimes sections identify which targets runtimes, architectures, etc that you want to be able to run your application on. @jakesays this is how you configure the packages, including the meta-package.

We will have a few different meta-packages for different app-models, this example is for UWP apps but we are working on "Microsoft.NetCore.Console" as well.

So I completely agree that if you have to list every single core library package for all your projects it will be completely unmanageable but most dev will leave that complexity up the core framework team when we define the meta-packages.

galvesribeiro commented 9 years ago

@weshaggard what if, I don't want/need something that is inside the metapackages? I that is the point on @jakesays question.

weshaggard commented 9 years ago

This isn't how it works yet but what we want to do is something similar to what happens if you pass a lot of extra references to csc.exe it will essentially have them in scope for use but it will not burn a reference into your assembly. What we expect to happen is your deployment will only contain the things that you use.

jakesays-old commented 9 years ago

@weshaggard @galvesribeiro is correct - I want to know how I can configure/build/define these meta packages. For me the biggest win of coreclr/corefx is the ability to roll my own package from the ground up. However for it to be successful my personal clr must be usable from within VS. Since you guys have opened this can of worms I want to be able to exploit it to fit my needs. Will this be possible?

Basically it comes down to this: If you guys are going to completely change the way I've been working with .net for the past 15 years, then I damn well better be able to get something out of it.

weshaggard commented 9 years ago

lol @jakesys. There isn't anything special about a meta-package anyone could create one if they so chose to. The most difficult part is testing it to have a degree of confidence that the closure of packages all work well together. Technically a meta-package is just a package with a set of dependencies, crack open https://www.nuget.org/packages/Microsoft.NETCore.UniversalWindowsPlatform and see what it looks like.

benaadams commented 9 years ago

I don't see any real application, from a notepad, to a web site, to a complex distributed system, where we wouldn't use collections.

There are a lot of applications: real-time systems, game servers, microservices, embedded systems, etc... where pre-allocated arrays will be the general collection choice; System.Numerics.Vectors will be the preferred "always include" library and even allocating memory after startup is strenuously avoided.

With the introduction of SIMD, coreclr and x-plat there are now a whole new class of programs that dotnet is now a very sensible choice for.

The trade off of needing to add System.Collections if you were to create a blank project; when most people will probably use a File->New template; which will have it included, isn't particularly onerous.

Especially with so much new functionality being implemented on top of the LINQ paradigm

Linq definitely isn't a core feature of the runtime; its very nice and very powerful, but again you would probably be even less likely to use it in the previous examples; and if you did you'd have to be exceptionally careful about which features you did use as its very easy to unwittingly allocate memory with.

jakesays-old commented 9 years ago

@benaadams you couldn't be more wrong on many levels. While Linq is not a 'coreclr' feature, it certainly is a core feature of the .net platform. You need to start looking at things from a practical use perspective (in other words, from how the platform is actually being used) and not from a purely technical standpoint. The use cases you site (real time systems, game servers, embedded systems, etc) are such a small part of the overall .net ecosystem. If you focus on those edge cases (especially ones that do not allocate memory) you are doing a HUGE disservice to the rest of us who earn a living writing, you know, regular stuff.

weshaggard commented 9 years ago

Thanks for all the discussion on this issue but there isn't anything actionable here at this point so I'm closing the issue.

jakesays-old commented 9 years ago

I'm not exactly sure what you mean by nothing actionable. Are you saying "hey thanks for the discussion, but it ain't going to happen?"

Or are you saying "It's not actionable because I didn't understand the issue?"

benaadams commented 9 years ago

@jakesays my point was you just need to add a line to your project.json, if its not already pre-added; which isn't very hard. If its baked into the runtime, you cannot take it out; as well as forcing to be in lock-step with the runtime releases as pointed out previously.

jakesays-old commented 9 years ago

@benaadams we're not talking about thousands of lines of oft changing code here. The collection API is very stable, and again, you're optimizing for very rare use-cases. I would rather see the very small fraction of users use a build tailored for their environment than saddle the majority with additional maintenance/configuration burden. Maintaining configuration files is actually a lot of work in a large environment, especially one that will have to create their files manually instead of being forced to use this new json mess.

weshaggard commented 9 years ago

@jakesays I'm saying it is isn't actionable because we don't intend to merge the standalone System.Collections.Generic library into mscorlib which is what appears to be the main ask here.