[WIP] Module system - Githubissues

This proposal went through a major overhaul, completely redesigned it since the initial variations.

Introduction

This issue will try to lay down the module-system of the language, which mainly consists of the following ideas:

How do we split up the code into multiple files
How do we define the reusable units of code
How do we avoid name collisions
How do we access elements of another module
How do we control accessibility of elements

Note: Module systems are surprisingly hard to design in my opinion. They are often overlooked in language/compiler design. If I've missed something crucial, let me know.

A summary on other languages

I've attempted to put together a more complete summary of module systems in #73. This proposal also uses terminology from that issue, skimming through it is advised.

Proposal for Fresh

The proposal aims for something similar to the JavaScript or Rust module system, while fixing some of the mistakes or missing features.

Packages

Packages would be essentially NuGet packages/projects/assemblies, like the usual. The names can stay namespaced to reduce the chance of name collisions between packages. This means, that Foo.Bar is a completely valid package name.

Modules

The module hierarchy would follow the directory structure, like in JavaScript or Rust. A file foo.fr would create a module named foo. A file in foo/bar/baz.fr would create the module foo.bar.baz. This was already a soft convention with C# namespaces and the directory structure.

Module folders can group their contribution under the folder's package name. For example, if a folder math has files marix.fr, vector.fr and quat.fr, a file called module.fr inside math can provide a single module interface for all of these. This is very similar to JavaScript barrel files or mod.rs in Rust. (Note, that the file name is not permanent, we can use something shorter or completely different, module.fr is just an initial idea).

The outermost module (the top-level) would give the entire API interface of the entire package, which has an identical name to the package name.

Namespaces

The language has no direct concept of namespaces. This does not mean that there will not be namespaces when generating code, namespaces still exist on CLR level. This will be discussed in the interop section in more detail.

Exporting

Each module is responsible for exporting their own API towards other modules. Any symbol that should be accessible from the outside should be marked for exporting, similarly to JavaScript.

We could provide exporting inline, marking the elements directly, or as a separate export list. The export list could allow aliasing certain symbols:

// Exporting members in an export list, aliasing y
export { y as second, add1 };

// Exporting x inline
export var x = 0;

var y: int32;

func add1(x: int32): int32 = x + 1;

Reexporting could also be supported (identical to the JavaScript feature, similar to pub use in Rust), which propagates another modules symbols upwards from the current module (especially useful in module.fr-like API files):

// Assuming this is foo.fr

// Now, everything is accessible from foo, that was exported by bar, as is
export * from bar;

// Aliased add1, expose everything else as-is from baz
export { add1 as add_one, * } from baz;

Note, that JavaScript syntax is used in the examples. We could get rid of the braces, as their sole purpose in JavaScript is to differentiate default imports/exports.

Module visibility

Modules are not visible by default, meaning that the following would cause an error:

// In math/basic.fr
func add1(x: int32): int32 = x + 1;

// In the main module file
func main() {
    var one = math.basic.add1(0);
}

They also need to be exported from their parent:

// In math/module.fr
export basic;

This gives the ability to control the visibility of submodules. With this feature, it's easy to hide that a given module is actually the accumulation of multiple other submodules and the consumer won't accidentally depend on such detail.

The exported modules of the main module will become part of the public API alongside the other symbols exported there.

Importing

Things can be accessed through their fully qualified name, but importing into local scope can be done with the import statement, similarly to JavaScript:

// Imports the foo.bar module under the name baz
import foo.bar as baz;

// Imports every symbol from foo
import * from foo;

// Imports x renamed to first and add1 from qux
import { x as first, add1 } from qux;

Note, that import some_module; in JavaScript is only used to execute the side-effects of a module and I haven't found a use for it in this design, so currently that form of importing is unspecified.

Note, that I don't want to take away the possibility to access everything by the fully qualified name, because this is a thing that would be heavily utilized by metaprogramming.

Member visibility

By default, type members would be not accessible from the outside. The members can be exported, just like other symbols to make them public. Note, that just like in Rust, this visibility is completely transitive, meaning that you don't need to keep re-exporting the type members to keep them externally accessible.

Interop with C

Namespaces, module hierarchy

By default, the namespace of a module would simply be the name of the package name. The module hierarchy would be nested static classes. For example, if there is a module foo, which has two sub-modules bar and baz, all within package Hello.World, in C# it would look like something like so:

namespace Hello.World
{
    static class foo
    {
        static class bar
        {
            // Contents of module...
        }

        static class baz
        {
            // Contents of module...
        }
    }
}

If required for interop, the namespaces or even the module class names could be controlled by attributes.

Visibility

The CLR has no concepts of import/export, only the standard visibilities of public/internal/private/... The symbols that are exported by the top-level module would be public, the rest of the symbols would become internal. The protected modifier would be recognized for interop, when extending an external .NET type.

Interesting features

File-nested modules

Just like in Rust, submodules could be introduced without a hierarchy within the same file:

// In foo.fr

module bar
{
    // This is module foo.bar

    module baz
    {
        // This is module foo.bar.baz
    }
}

module qux
{
    // This is module foo.qux
}

This is useful for metaprogramming, or for things like compressing a package into a single file.

Module extensions

Modules can be thought of as types with only static members. I believe there is no reason not to allow extending them, allowing for module extensions (syntax from #52):

module math
{
}

// We extend math like a type!
impl math {
    func add1(x: int32): int32 = x + 1;
}

A limitation of C# using statements is that they can only appear top-level. It can not be used to import inside a function for example.

Not quite true. They can appear inside a namespace declaration or outside, a fact that has started many a style war over the years.

F# has a third concept introduced, which are modules... They are roughly just static classes

I wouldn't even say roughly. They are exactly static classes. Same with VB modules.

Important to note in all of this that none of these correspond to what IL calls modules.

Another note is that while IL certainly lets you define methods in namespaces, I don't think any .NET language would be able to access them (maybe C++/CLI could?).

Not quite true. They can appear inside a namespace declaration or outside, a fact that has started many a style war over the years.

Fair, I've even used that at one of the workplaces, where it was convention to put usings into the namespace so you have to type the least amount of prefixes. By "top-level" I meant file or namespace declaration level, but I'm not sure there's a name for that declaration level.

My point there was to suggest that importing doesn't have to happen at only those levels, there is no reason not to allow importing in a function-local context.

Another note is that while IL certainly lets you define methods in namespaces, I don't think any .NET language would be able to access them (maybe C++/CLI could?).

Oh absolutely, we'd likely wrap up free-functions in static classes, or have the innermost module level be necessarily a static class.

My issue with C# using statements, is that you sort of have the "worst" scenario: All namespaces from your dependencies are implicitly available in your code.
The packages name follow the namespace naming too, but it's code have the same namespace by convention only.
In case of an identifier conflict the recommanded choice is to use a more specific name, increasing verbosity, verbosity needed... because namespaces are implicitly available.
You must use using to not be too verbose, but using doesn't fully solve your problem,
because you can still produce conflicts by adding a dependency to your project.
Global namespace using sacrifice explicitness to remove the clutter caused by the using statement spam. In the end you are still very verbose, but you still need tooling to know from where a function/class come from.

That's why I kinda like python/js imports systems, because you can use it how you like it:

import { foo, bar } from myPackage when you decide to be fully explicit, you can know exactly where come from each identifier, and easily know which library you are using in this file.
import * from myPackage Less explicit but you still know which package you use in this file.
import * from ../myImportsFile or even import * from *
When you don't care about implicitness and want the less boilerplate as possible.
Note: myImportsFile here would basically looks like a js barrel file, allowing to still have controls over your imports, globally.

The main difference is that python/JS work by importing by package, not namespace, without bringing the dependency explicitly, it allow you to be more explict for a little more cluter on top of the file, but less verbosity in the body. Most importantly, it give more freedom about it.

In practice, python packages management is so bad I can't even restore a project. And for both JS&Python, they now regret not having namespace in their package naming standards, especially npm that duct-tape fixed it with the "package scope"

For a new language that would not have any interop concern, I personally would go with the "you import packages, not namespace" option. But here, I don't know, it would require a big sacrifice, like nuking the namespace concept from the code, and masking the namespace of the code you depend on.

Side note:

C# has using static, but there is no reason it couldn't just work with the same keyword:

I think it would be sane to test if this doesn't create to too much conflicts caused by dependencies.

Edit: some interesting discussion from discord: About "import package, not namespace":

@LPeter1997: Or is a hierarchy allowed within a library/project? On "import by package" scenario, the hiearchy of the content question came up: I have no idea. @Kuinox good question, I don't know. that may bring the namespace concept back in. js/python works with folder, so they introduce a hierarchy like that. should the import propagate to childs ?

About exports:

@Kuinox: imo, the import system decisions must be done after the export system decisions, if there is one. on some export system, you can rapidly craft different level of import explicitness by playing with the exports/imports, like I showed in my comment, which I find is the ideal scenario as you can please everyone.

IMO the choice to somehow forget namespaces by binding them to the package name is not a good move.

This basically couples the implementation to the "exposition": to use a Type, I need to know where it has been implemented.

Up to me, this complicates evolution of complex code bases. A Type is semanticaly defined by its Namespace.Name and the fact that it is implemented/exposed today in package A and tomorrow by package B is a very good thing.

You can still expose Type defined in package A in package B. If you define Type Foo in package A and module bar than in B you can do this: export Foo from A.bar; and from outside it will look like Foo was defined in package B

IMO the choice to somehow forget namespaces by binding them to the package name is not a good move.

The philosophy of namespaces here is implicit by nature, your namespace hierarchy is defined by structure. This was already a soft convention in C#. Namespaces essentially become package name + module hierarchy. For example, if I have this:

// in file core.fr

module math
{
    export matrices;

    module matrices
    {
        export type mat4x4(...);
    }
}

Then effectively, the namespace of mat4x4 is core.math.matrices. The fact that modules are able to propagate language elements is convenience to be able to aggregate many smaller modules into bigger ones (such as a math library, where we likely don't care about the tiny submodules, a simple math module name should suffice).

This basically couples the implementation to the "exposition": to use a Type, I need to know where it has been implemented.

That's semi-true! The exporting mechanism allows you to keep up a stable API file, similarly to JavaScript barrel files. The underlying implementation can change, files can be split or merged, but you can still re-export the module as the old API.

Up to me, this complicates evolution of complex code bases. A Type is semanticaly defined by its Namespace.Name and the fact that it is implemented/exposed today in package A and tomorrow by package B is a very good thing.

~~I'm going to assume you made a typo here and meant "not very good thing".~~ (Edit: realized what you actually meant, hopefully this doesn't change the validity of the following) This is double-edged IMO. There are quite a few projects that rely on external packages, let's take the language server implementation of OmniSharp that relies on Nerdbank.Streams for example. Usage could look something like this:

// We want to consume OmniSharp, we import parts of it
using OmniSharp. ...;
// For configuration, we need to grab some of the Nerdbank streams
using Nerdbank.Streams. ...;

I see a problem with this: what if we want to change the implementation but keep the API stable? For example, we realize that Nerdbank.Streams is not efficient enough for some reason. Do we reimplement Nerdbank.Streams under the same namespace? Currently in C#, you'd likely do that, if you don't want to update references everywhere.

But with an exporting mechanism, you could expose the dependency under your own package:

// in some omnisharp module file

module streams
{
    export * from Nerdbank.Streams;
}

This means that internally you can change what you exported. Heck, you can roll your own implementation there, as long as the API stable. But in both cases, the access path and reference is omnisharp.streams, the fact that it relies on some specific external package is not necessarily exposed.

You're totally right on the "double-edged". I recently had to do this, not for NerdBank but for Dapper (see here: https://github.com/DapperLib/Dapper/pull/1802). To benefit from this feature I did "reimplement Dapper under the same namespace" (the fork is here: https://github.com/signature-opensource/CK-Dapper/). Our projects now depend on this package. No changes required.

Now, if a project depends on both CK.Dapper and Dapper (typically through a transitivite dependency) it will not compile (CS0433 - The type 'SqlMapper' exists in both 'CK.Dapper' and 'Dapper') and this is a life savior: I don't want my code base to be able to run with 2 different implementations of the same beast at the same time.

May be this discussion is all about private vs. transitive dependencies. For me, private dependencies are hell. I definitely live on the "transitive" side of the world: a project must use homogeneous versions for all its components (including dependencies).

This is my approach, just me trying to handle the numerous complexities of today's developer's job...

That's totally understandable! I've shown in my previous post how you can actually hide where dependency elements come from and by that, you can replace them, essentially bringing them closer to private dependencies (while not being truly private, we are still in .NET world). You might say this is more cumbersome, I see this as a way to hide names (ability to alias) and only expose elements from the dependency you truly want to expose (which makes it even easier to replace I believe). It's also an opportunity to document and connect up the external dependencies your product uses, I see this as a good thing as well.

A Type is semanticaly defined by its Namespace.Name and the fact that it is implemented/exposed today in package A and tomorrow by package B is a very good thing.

An important note is that while it might be possible to do this easily at the source level, doing it at the binary level requires TypeForwardedToAtttribute from A to B, as CIL tightly couples names and the package that produced them.

Draco-lang / Language-suggestions

[WIP] Module system #58