carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.25k stars 1.48k forks source link

Suggestion: ditch packages, namespaces and libraries from syntax #2154

Closed Vanuan closed 1 year ago

Vanuan commented 2 years ago

Carbon introduces multiple concepts at the same time: libraries, packages and namespaces at the syntax level. I don't think all of these should be exposed as a part of the source code.

Let's overview what's the current state.

The smallest unit doesn't have a name - unit of compilation. This should be given a name. The documentation calls it a "file".

Files are grouped into a unit of distribution - a package.

A package can contain multiple libraries.

Libraries group "api" files and "impl" files so that there's a private namespace and a public interface of a library. So that a package exposes multiple libraries APIs.

In addition there are namespaces. Namespaces allow to breach through the library namespace bounds thus making it possible to have a common private implementation without exposing api to the package level.

Is it only me who thinks this is too complicated?

I propose the following:

Why are syntax level packages/libraries/namespaces required? Is it so that it's easier map source files to binary objects?

jonmeow commented 2 years ago

If you're working on a small project, you can definitely just put everything in an api file. Nothing really requires using an impl file. However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism). See Collapse API and implementation file concepts for more discussion on the alternative.

Regarding namespaces, again they're not something you need to use. However, organizationally I think we can see from C++ that some developers find namespaces useful, and it's likely to help ease a migration.

I'm moving this to a question for leads in case there's something that I'm missing here, design-wise, although maybe converting this to a discussion may make sense.

Vanuan commented 2 years ago

However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)

Such a mindset is bringing the worst of C++ into a new language. Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.

Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.

Vanuan commented 2 years ago

Let's analyse the claimed disadvantages listed in the collapse concepts proposals:

  1. Can't compile in parallel
  2. Developers might not be aware they are changing API.
  3. Java (and other languages) has interfaces, but Carbon doesn't.
  4. Compilation performance
  5. Read all the files to learn API
  6. Carbon build system will need to build a dependency graph, so every file will need to be compiled
  7. We want the compilation to be in parallel
  8. Developers are too lazy to read export signatures
  9. Name collisions
  10. Longer package names
  11. Eventually, there would be a need to split larger packages into smaller ones
  12. But we still want to deliver large packages, so that it's easier to version them
  13. We already defined a syntax for import/export
  14. We want to name distributed packages and actual symbols imported differently
  15. We want to import whole namespaces. We don't want to import every single symbol used
  16. If any symbol is possible to import from anywhere, that's a lot of symbols to keep in memory for IDE
  17. There will be more bytes required to name all the symbols
  18. We don't want special export/public/private keywords
  19. Every file must be parsed
  20. We want shorter symbol naming

Did I catch everything?

Vanuan commented 2 years ago
  1. You can compile in parallel once you know the dependency graph. Artificial separation into "impl" and "api" files only parallelizes the "impl" part, which may take less compilation time than compiling the "api" part.
  2. Developers can be warned by the IDE "you've changed the export signature of the module. This module has 100500 dependants and will take ages to compile. Are you sure?"
  3. Interfaces are not strictly for API / implementation separation. They are more like generic types. It's automatically generated documentation (JavaDoc) that extracts the API from Java implementation files.
  4. Compilation performance is not achieved by grouping the codebase into large chunks. Instead, it is achieved by making sure you only compile what is changed. Thus building the dependency tree, comparing the changes and compiling the smallest amount of code needed.
  5. Again, tooling. API documentation generators exist
  6. The Carbon import/export syntax should be simple enough for the AST parsing to be fast. You don't need to know the semantics to build a dependency graph, so it should be quick. It can be further optimized by keeping a dependency graph in memory and runtime module reloading
  7. You can do parallel optimizations when you know the dependency graph.
  8. API documentation exists
  9. Here, Carbon designers need to decide whether they want packages to be distributed in the source code or the binary form. If it's the binary form, long symbol names are inevitable. Alternatively, some dynamic symbol mapping should be invented.
  10. See 9
  11. Libraries/packages grow. Refactoring is a natural evolution process
  12. Versioning is a complex problem. Again, there's a difference between API and ABI, so maybe you'd prefer smaller packages/libraries depending on how you want to distribute them
  13. Maybe reconsider. C++ module export/import syntax is nice
  14. This is easily solvable for source code distribution but is hard for binary packages. What's the advantage?
  15. Importing whole namespaces creates a large dependency surface. I thought you were all for reducing the compilation time. Anyway, mass import syntax can be thought of
  16. I think a package is a good boundary to determine what could be imported/exported from it. Every package/library/folder can have an index/init/entrypoint file that determines symbols. So if you work on package A and have packages B and C as dependencies, you only need to keep public symbols from B and C and all symbols from package A. If that's still to much, consider splitting A into smaller packages.
  17. If you want to distribute ABI, name collisions are something to deal with. Maybe a runtime which dynamically generates unique symbol names?
  18. Why not?
  19. Every project has a few entry points, so only entry points should be parsed. And then imported modules from those entry points and so on.
  20. Maybe there's a way to encode namespaces inside the ABI? Some sort of symbol compression? But that would probably require a runtime.
chandlerc commented 2 years ago

However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)

Such a mindset is bringing the worst of C++ into a new language.

I think this is not a productive way to engage in discussions about Carbon. Among other things, it reads as an absolute statement that doesn't acknowledge that other people may disagree, and it reads as dismissive of the point that Jon tried to make.

Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.

Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.

One challenge of this approach in my experience is dealing with the dependencies of the file that need to be present in order to extract the public signature. Often, the build system cannot definitively tell whether a dependency is necessary for that extraction, and so even extracting the interface will require all dependencies, even dependencies of just the implementation details, to finish building first.

While having separate API and implementation files directly exposes some extra parallelism, it also exposes the ability to have a more parallel build graph for large build systems.

Now, as Jon mentioned initially, not everyone needs this. Projects with 100s or 1000s of files may not. And it is fine to not use the features if they don't help. But we'd like the features to be available to help scale even further.

The current set of features was specifically designed based on the experience of several folks on the project scaling C++ builds, where we found exactly this separation important. As you say, we could do without it and work around any build scaling limitations. But so far the judgement call has been that the cost is reasonable and reasonably easy to avoid for users who don't need it.


There is also a completely separate reason that I at least appreciate separating API from implementation -- I find it to help me both organize my code and read the code of others. While I could in theory use tooling to extract this view, I prefer having the split directly reflecting in the source code itself, and being able to read the source code itself.

chandlerc commented 2 years ago

Let's analyse the claimed disadvantages listed in the collapse concepts proposals:

The list you give doesn't for me map to the arguments in the alternative that Jon linked to, so I'm afraid I don't follow this part of the discussion.

Vanuan commented 2 years ago

will require all dependencies, even dependencies of just the implementation details, to finish building first.

Let's analyze an example.

// main.carbon
import { PublicInterface } from './PublicInterface.carbon';
new PublicInteface();

// PublicInterface.carbon
import { ImplementationDetail } from './ImplementationDetail.carbon';

export class PublicInterface {
  PublicInterface() {
    new ImplementationDetail().doSomething();
  }
}

// ImplementationDetail.carbon
export class ImplementationDetail {
  void doSomething() {
    // ...
  }
}

So, your concern is that if we change the ImplementationDetal class, the signature of the PublicInterface class is not changed, there's no way to figure it out without analyzing ImplementationDetail? I disagree. AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?

OlaFosheimGrostad commented 2 years ago

I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.

They are also useful for extending concepts in a cross cutting fashion, moreover they are needed for extending overloaded functions/generics that are cross cutting. They are also useful for collecting things that should be edited together, but are associated with different parts of a program.

Are there other ways? Yes, you can introduce the concept of extension-slots in all aggregating concepts or other injection-mechanisms, but that is a radical departure from C++.

Vanuan commented 2 years ago

I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.

As I understand it, at its current design, Carbon lacks the ability to import a specific function from a package/library. And aliasing them to a file-scoped name.

So you just have to import everything to a file and use namespaces to pick which functions are visible to the current file?

Essentially, you mass-import a subset of the functions from a particular package into a single file.

Wouldn't it be nicer to handpick which functions to import? Yes, it makes a long import list, similar to Java, but it prevents you from headaches of inventing new names due to mass-import.

jonmeow commented 2 years ago

Imports only inject the package name into the namespace. i.e., import Math only adds one name, Math. There is no "mass-import" similar to Java's import package.*, so I don't think the name conflict issues you raise exist. This is also discussed on the proposal I mentioned previously, #107, here.

Vanuan commented 2 years ago

Imports only inject the package name into the namespace. i.e., import Math only adds one name,

But the combination of "import Math and use namespace Trigonometry is effectively the same as import Math.Trigonometry.* in Java. Right?

Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?

jonmeow commented 2 years ago

Imports only inject the package name into the namespace. i.e., import Math only adds one name,

But the combination of "import Math and use namespace Trigonometry is effectively the same as import Math.Trigonometry.* in Java. Right?

No, there is no use namespace.

Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?

I expect @OlaFosheimGrostad is making an comment about code organization preferences; i.e., it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.

chandlerc commented 2 years ago

AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?

This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.

It is possible to design such a build system, and it is one reasonable design. However, it is a significant constraint on the build system design that comes with its own tradeoffs. For example, the dependency graph must be somewhat computed dynamically based on intermediate steps. Many build systems work to avoid this because an immutable dependency graph provides significant simplifications for the rest of their architecture. Others support it, but with less efficiency.

This direction would also require some mechanism to enable extracting the interface dependencies separate from implementation ones. That would either push significant complexity into the tooling or require language extensions to simplify the problem. Either way, Carbon would pick up some complexity.

At the end of the day, I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies without taking on the complexity of language-specific tools to extract separate dependencies. I think it is both a relatively simple way to separate them in the language itself, and it is sufficiently simple that it makes it trivial for the build system to reflect this trivially.

Vanuan commented 2 years ago

This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.

Apparently, that's the path C++ is taking with its modules, isn't it?

That would either push significant complexity into the tooling or require language extensions to simplify the problem

The tooling is everything. For example, TypeScript is essentially a JavaScript tool, not a language.

I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies

So you'd prefer to specify dependencies between compilation units manually rather than to automatically infer them from import statements? It seems strange to me. What's the purpose of import statements if you still need to manually separate interfaces from implementation and think about the build performance? You can just use pre-compiled headers then.

Vanuan commented 2 years ago

it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.

You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.

Vanuan commented 2 years ago

Let's put it that way: it seems like Carbon at its current design treats a compilation unit (file) as an independent entity. That is, a source file is still treated as a walled "object" file rather than a part of a larger codebase with a repository of other compilation units. I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code. C++ delegates the dependency problem to a linker while not giving it an access to the source code. That's why C++ modules are kind of dead on arrival.

Maybe Carbon could benefit having a bit more dynamic compilation stage, so to speak.

I hope my thoughts make sense. I'm not a language designer, but I share the frustration with C++ build process.

geoffromer commented 2 years ago

That's why C++ modules are kind of dead on arrival.

Non-specific and hyperbolic criticisms like this don't make for constructive discussion, because they tend to alienate rather than informing the people who don't already agree. Can you remove or rephrase it?

chandlerc commented 2 years ago

Trying to pull out what seems like the high level point here, as I don't think we're making much progress debating the details...

it seems like Carbon at its current design treats a compilation unit (file) as an independent entity.

We are explicitly trying to provide tools that allow a very high degree of separate compilation and dependency management, especially in distributed build systems.

But we are also trying to allow folks to ignore them and use a simple & easy model when that's all they need.

I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code.

We are looking carefully at what other languages do here, but again, we want to provide some powerful tools that at least some users of C++ have and benefit from so that when moving from C++ to Carbon those tools still exist. One set of those is around a reasonably high degree of separation for compilation & dependency management.

There are definitely other ways to design a language, but given our goals and priorities, so far Carbon is pursuing the direction that optionally exposes some of these "power tools" for physically separating things.

OlaFosheimGrostad commented 2 years ago

You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.

What I meant was: In C++ it is sometimes convenient to have a namespace "mylib::operators" or something like that so that you can keep seldom used functionality in "mylib" and use them as "mylib::func", but still get to use frequently used stuff in "mylib::operators" unqualified so that your code becomes more readable. At least, that is my experience with C++ (C++ source code can become overly verbose and hard to read if everything is namespace qualified).

((C++ has something called an "inline namespace" that allows you to group a subset under "mylib::operators::func", but still access them as "mylib::func".))

chandlerc commented 1 year ago

Just wanted to say the leads did go back over this, and we're happy with the current Carbon design direction here.

I wrote up what I think is a reasonable summary for the reasons we're currently planning to stick with this direction above: https://github.com/carbon-language/carbon-lang/issues/2154#issuecomment-1242246708

We can revisit this in the future, but should do so with new information or data or specific problems that we need to solve.