Closed Vanuan closed 1 year ago
If you're working on a small project, you can definitely just put everything in an api
file. Nothing really requires using an impl
file. However, impl
files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism). See Collapse API and implementation file concepts for more discussion on the alternative.
Regarding namespaces, again they're not something you need to use. However, organizationally I think we can see from C++ that some developers find namespaces useful, and it's likely to help ease a migration.
I'm moving this to a question for leads in case there's something that I'm missing here, design-wise, although maybe converting this to a discussion may make sense.
However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)
Such a mindset is bringing the worst of C++ into a new language. Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.
Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.
Let's analyse the claimed disadvantages listed in the collapse concepts proposals:
Did I catch everything?
However, impl files are useful because they allow for separate compilation (i.e., for a really big project, you can compile much faster and with more parallelism)
Such a mindset is bringing the worst of C++ into a new language.
I think this is not a productive way to engage in discussions about Carbon. Among other things, it reads as an absolute statement that doesn't acknowledge that other people may disagree, and it reads as dismissive of the point that Jon tried to make.
Leave alone the headers/implementation separation idea. The performance hit of touching header files is caused by a preprocessor. In C++ you're compiling the same code again and again. Compilation of N source files (when you don't require N! combination) is actually very quick.
Even with a thousand files project you can just extract public signatures of each module and compare them with a cached version and recompile dependants only if it's changed. That's how you prevent needless compilation, not by creating artificial stable/unstable API boundaries.
One challenge of this approach in my experience is dealing with the dependencies of the file that need to be present in order to extract the public signature. Often, the build system cannot definitively tell whether a dependency is necessary for that extraction, and so even extracting the interface will require all dependencies, even dependencies of just the implementation details, to finish building first.
While having separate API and implementation files directly exposes some extra parallelism, it also exposes the ability to have a more parallel build graph for large build systems.
Now, as Jon mentioned initially, not everyone needs this. Projects with 100s or 1000s of files may not. And it is fine to not use the features if they don't help. But we'd like the features to be available to help scale even further.
The current set of features was specifically designed based on the experience of several folks on the project scaling C++ builds, where we found exactly this separation important. As you say, we could do without it and work around any build scaling limitations. But so far the judgement call has been that the cost is reasonable and reasonably easy to avoid for users who don't need it.
There is also a completely separate reason that I at least appreciate separating API from implementation -- I find it to help me both organize my code and read the code of others. While I could in theory use tooling to extract this view, I prefer having the split directly reflecting in the source code itself, and being able to read the source code itself.
Let's analyse the claimed disadvantages listed in the collapse concepts proposals:
The list you give doesn't for me map to the arguments in the alternative that Jon linked to, so I'm afraid I don't follow this part of the discussion.
will require all dependencies, even dependencies of just the implementation details, to finish building first.
Let's analyze an example.
// main.carbon
import { PublicInterface } from './PublicInterface.carbon';
new PublicInteface();
// PublicInterface.carbon
import { ImplementationDetail } from './ImplementationDetail.carbon';
export class PublicInterface {
PublicInterface() {
new ImplementationDetail().doSomething();
}
}
// ImplementationDetail.carbon
export class ImplementationDetail {
void doSomething() {
// ...
}
}
So, your concern is that if we change the ImplementationDetal class, the signature of the PublicInterface class is not changed, there's no way to figure it out without analyzing ImplementationDetail? I disagree. AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?
I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.
They are also useful for extending concepts in a cross cutting fashion, moreover they are needed for extending overloaded functions/generics that are cross cutting. They are also useful for collecting things that should be edited together, but are associated with different parts of a program.
Are there other ways? Yes, you can introduce the concept of extension-slots in all aggregating concepts or other injection-mechanisms, but that is a radical departure from C++.
I like namespaces. They allow me to collect free functions with short names that I want to use unqualified into a sub-namespace that can be used where I need them without running out of names or having to resort to long names that affects legibility of the code.
As I understand it, at its current design, Carbon lacks the ability to import a specific function from a package/library. And aliasing them to a file-scoped name.
So you just have to import everything to a file and use namespaces to pick which functions are visible to the current file?
Essentially, you mass-import a subset of the functions from a particular package into a single file.
Wouldn't it be nicer to handpick which functions to import? Yes, it makes a long import list, similar to Java, but it prevents you from headaches of inventing new names due to mass-import.
Imports only inject the package name into the namespace. i.e., import Math only adds one name,
But the combination of "import Math
and use namespace Trigonometry
is effectively the same as import Math.Trigonometry.*
in Java. Right?
Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?
Imports only inject the package name into the namespace. i.e., import Math only adds one name,
But the combination of "
import Math
anduse namespace Trigonometry
is effectively the same asimport Math.Trigonometry.*
in Java. Right?
No, there is no use namespace
.
Isn't that what you mean by "collect free functions with short names that I want to use unqualified into a sub-namespace that can be used"?
I expect @OlaFosheimGrostad is making an comment about code organization preferences; i.e., it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.
AST parsing allows you to do a lazy evaluation of imports. You don't need to know what the imported symbol refers to for extracting the API of the PublicInteface.carbon file since it's not a part of a source code interface. Maybe you refer to a binary interface?
This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.
It is possible to design such a build system, and it is one reasonable design. However, it is a significant constraint on the build system design that comes with its own tradeoffs. For example, the dependency graph must be somewhat computed dynamically based on intermediate steps. Many build systems work to avoid this because an immutable dependency graph provides significant simplifications for the rest of their architecture. Others support it, but with less efficiency.
This direction would also require some mechanism to enable extracting the interface dependencies separate from implementation ones. That would either push significant complexity into the tooling or require language extensions to simplify the problem. Either way, Carbon would pick up some complexity.
At the end of the day, I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies without taking on the complexity of language-specific tools to extract separate dependencies. I think it is both a relatively simple way to separate them in the language itself, and it is sufficiently simple that it makes it trivial for the build system to reflect this trivially.
This seems to assume both the build system using a language-specific tool to separate the interface from the implementation details (including analyzing and separating dependencies), and the build system restructuring the build graph based on that refined dependency information.
Apparently, that's the path C++ is taking with its modules, isn't it?
That would either push significant complexity into the tooling or require language extensions to simplify the problem
The tooling is everything. For example, TypeScript is essentially a JavaScript tool, not a language.
I would prefer that Carbon's design allow build systems to achieve the physical separation of dependencies
So you'd prefer to specify dependencies between compilation units manually rather than to automatically infer them from import statements? It seems strange to me. What's the purpose of import statements if you still need to manually separate interfaces from implementation and think about the build performance? You can just use pre-compiled headers then.
it's sometimes nice to put functions into a child namespace instead of having them hanging out in the same scope as a library's classes.
You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.
Let's put it that way: it seems like Carbon at its current design treats a compilation unit (file) as an independent entity. That is, a source file is still treated as a walled "object" file rather than a part of a larger codebase with a repository of other compilation units. I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code. C++ delegates the dependency problem to a linker while not giving it an access to the source code. That's why C++ modules are kind of dead on arrival.
Maybe Carbon could benefit having a bit more dynamic compilation stage, so to speak.
I hope my thoughts make sense. I'm not a language designer, but I share the frustration with C++ build process.
That's why C++ modules are kind of dead on arrival.
Non-specific and hyperbolic criticisms like this don't make for constructive discussion, because they tend to alienate rather than informing the people who don't already agree. Can you remove or rephrase it?
Trying to pull out what seems like the high level point here, as I don't think we're making much progress debating the details...
it seems like Carbon at its current design treats a compilation unit (file) as an independent entity.
We are explicitly trying to provide tools that allow a very high degree of separate compilation and dependency management, especially in distributed build systems.
But we are also trying to allow folks to ignore them and use a simple & easy model when that's all they need.
I think all the modern languages nowadays treat code as an in-memory graph database rather than static machine code.
We are looking carefully at what other languages do here, but again, we want to provide some powerful tools that at least some users of C++ have and benefit from so that when moving from C++ to Carbon those tools still exist. One set of those is around a reasonably high degree of separation for compilation & dependency management.
There are definitely other ways to design a language, but given our goals and priorities, so far Carbon is pursuing the direction that optionally exposes some of these "power tools" for physically separating things.
You don't have such a problem if you don't import the whole package in the first place. If Carbon would allow single name imports with aliasing you wouldn't need namespaces. You would only need a hierarchical directory structure of package contents.
What I meant was: In C++ it is sometimes convenient to have a namespace "mylib::operators" or something like that so that you can keep seldom used functionality in "mylib" and use them as "mylib::func", but still get to use frequently used stuff in "mylib::operators" unqualified so that your code becomes more readable. At least, that is my experience with C++ (C++ source code can become overly verbose and hard to read if everything is namespace qualified).
((C++ has something called an "inline namespace" that allows you to group a subset under "mylib::operators::func", but still access them as "mylib::func".))
Just wanted to say the leads did go back over this, and we're happy with the current Carbon design direction here.
I wrote up what I think is a reasonable summary for the reasons we're currently planning to stick with this direction above: https://github.com/carbon-language/carbon-lang/issues/2154#issuecomment-1242246708
We can revisit this in the future, but should do so with new information or data or specific problems that we need to solve.
Carbon introduces multiple concepts at the same time: libraries, packages and namespaces at the syntax level. I don't think all of these should be exposed as a part of the source code.
Let's overview what's the current state.
The smallest unit doesn't have a name - unit of compilation. This should be given a name. The documentation calls it a "file".
Files are grouped into a unit of distribution - a package.
A package can contain multiple libraries.
Libraries group "api" files and "impl" files so that there's a private namespace and a public interface of a library. So that a package exposes multiple libraries APIs.
In addition there are namespaces. Namespaces allow to breach through the library namespace bounds thus making it possible to have a common private implementation without exposing api to the package level.
Is it only me who thinks this is too complicated?
I propose the following:
Why are syntax level packages/libraries/namespaces required? Is it so that it's easier map source files to binary objects?