Draco-lang / Language-suggestions

Collecting ideas for a new .NET language that could replace C#
75 stars 5 forks source link

[WIP] A summary on module systems #73

Closed LPeter1997 closed 1 year ago

LPeter1997 commented 2 years ago

Since the modules issue is one of the most important ones as of right now, I've tried to summarize what other major languages have done in this area, as well as trying to read up on some literature about it.

Goal of this document

The goal of this document is to investigate the world of module systems of different languages. This is a topic that is not talked about very much, but it's one of the most fundamental features of a language to aid in code reuse, decomposition, etc. I'd like to go through the small amount of articles that I was able to dig up about this, then visit some languages and how they designed their module systems.

Existing discussions/literature

Before discussing each language, I'd like to go through the existing discussions and literature and how they contribute to the concept of module systems.

A blog post by Jonathan Goodwin

Link to the relevant blog post.

It mainly talks about the two main flavors of modules:

The post then talks about the similarities between the idea of a module and types:

"What next?", a post on next steps in language design

Link to the post.

It's a single significant paragraph that essentially agrees with the "Modules Matter Most" presentation (we'll discuss that one later) and gives a few "components" without any explanation:

Sadly, the author gave no real information or examples on these outside the single reference to Modules Matter Most, which we will discuss later.

It also claims that most languages mean simply "a way of managing namespaces or compilation order" when they talk about module systems.

Relevant discussion on Reddit

Link to the discussion.

This is a summary by Jonathan Goodwin on the previous, "What's Next?" post, where they summarized an explanation for the key "components" that weren't provided in the post originally. The explanations rely on heavy theory, I'd advise checking out the post and the discussion only if you are interested.

In summary, I believe that many features try to lift modules to be on the same level of features as types: they can implement interfaces, they can do polymorphism, they can be instantiated with types or values, you could have functions returning modules, you could have modules passed in as values, ...

I believe this "roadmap" is not surprising, given that even C# uses things like static classes, which is already a type we usually think of as a module. Making them an instance would essentially lift modules to value level, giving us first-class modules (that would likely have the only limitation compared to regular classes + instances that it would be completely immutable).

Modules Matter Most by Robert Harper

Link to the slides.

In the beginning they claim that "the ML module system will be the standard other systems will be compared to, and is something that not easily can be improved on (and is often actually made worse)" [slide 4]. Then it goes on to detail its key ideas.

The first part of the slides (up until around slide 14) are not too bad, after that it gets a bit theory-heavy. Feel free to read it, but again, my feeling again is that modules are being brought closer to types and values.

Practical Foundations for Programming Languages by Robert Harper

Abbreviated version by the author on this university subject site.

It's heavier on the theory side, mentioned for the completeness' sake. Since the author is the same as before, unsurprisingly they push the same ideas as the ML module system. They just happen to use a lot more "functional jargon" that makes the concepts sound scary and hard for the average developer to understand.

An SO discussion about first-class modules

The SO question.

The top answer very strongly hints what I've hinted at previously: ML simply attempts to bring modules and regular values closer together, with some restrictions of course. Namely, modules are usually immutable.

How Standard ML modules map to Scala objects (Draft)

The blog post.

This post was written by someone likely way more seasoned in ML modules than me, but essentially proved my gut feeling about ML modules being just "classes and instances behind the scenes".

OCaml Programming: Correct + Efficient + Beautiful, chapter 5

Chapter of the online book/course.

It's a great introductory material for an ML-like module system. In this case, it's OCaml. The course makes the parallel that I have been making between ML-modules and OO constructs. The section about the ML module system will be mostly based on this material.

That's it?

Unfortunately, that's all the literature I could find. Most of it is on the more academic, ML-side of things. If there's anything else, please let me know, so I can extend this document accordingly.

Aspects/features for inspection

There are quite a few aspects/features to consider when looking at how a language defines its module system. Here I'd like to give a breakdown of how we'll look at each language to be relatively methodical and consistent. These aspect will be quite different from aspects coming from the literature. This is because we are looking for a more "practical" and lower-level overview. We can still get inspired by the ML module system in some aspects, if we want it.

Terminology

When talking about module systems, there are various terms floating around, the main 3 being:

While these can be defined as separate entities meaning different things, some languages blur the lines between them, merge them or remove them completely. For completeness' sake, let's look at a possible differentiation of these concepts:

These "definitions" also give us the three main goals of a module system, which are:

Discussed features

Here I'd like to shortly describe the different aspects we'll use for the different languages.

Template

If you want to contribute by investigating a language, feel free to use this template as a starter.

## Language name
### Concept of package/module/namespace
Do these concepts exist in the language? Do they blur them together?

#### Package/module/namespace open-ness
Are these concepts open for extension from outside the file it is defined in or even outside the codebase?

#### Subpackages/submodules/subnamespaces
Do these elements allow nesting, allowing a parent-child relation?

### Exporting symbols
How does a package/module/namespace export a symbol (a function, a type, ...) to be used outside of itself? Does it only export it to the parent or publishes it project-wide? Does it export it under the same name, or can you alias it?

### Referencing an external package/module/namespace
How do we reuse a package/module/namespace written by someone else? How do we reference it in our project?

### Importing symbols
In a file, how do we import symbols (a function, a type, ...) to be used inside a file? Does importing dump contents into the global namespace? Does it allow aliasing?

### Interesting features
This is mainly here to put anything that has no equivalent into this section. Things like parametric or first-class modules would come here.

Investigated languages

C\

Concept of package/module/namespace

C# has two of the three concepts directly: packages and namespaces.

Packages

Packages come in two forms: Assemblies and NuGet Packages. Assemblies are simply the compiled DLL produced by the compiler. You can reference such a DLL and use it no problem. NuGet packages append some metadata to this assembly and zip it up so you can publish it in package repositories online. While NuGet packages can contain anything (binaries, analyzers, raw resources), in terms of code reuse, it's just the compiled assembly zipped up with metadata.

Modules

While C# doesn't have modules directly as a feature, we often treat namespaces or static classes as modules. Example:

static class Math
{
    public static int Square(int x) => x * x;
}

Namespaces

By convention, we declare most of our source files within some namespace. Namespaces are hierarchical, and by convention we usually follow the directory structure with the namespace naming. Note, that unlike Java this is just a convention, not a requirement. For example, if our project is named Game.Utilities and you have a file inside it at the path Math/Interpolations/Slerp.cs, you'd put it in the namespace Game.Utilities.Math.Interpolations by convention, but nothing stops you from putting it in Foo.Bar.Baz.

C# also has two syntaxes for namespaces. Initially, there was only the braced syntax:

namespace Foo.Bar
{
    // ...
}

Since most people only declared one namespace per file, the indentation felt like wasted space, so file-scoped namespaces were introduced:

namespace Foo.Bar;

// Everything under here is part of Foo.Bar

Package/module/namespace open-ness

Packages, by nature are not open, they are closed, compiled binaries.

Modules in the form of static classes can be extended within the same assembly, if the static class is marked with the partial modifier.

Namespaces are completely open, any project can declare anything in any namespace as long as they don't cause a name-collision.

Subpackages/submodules/subnamespaces

Packages allow no nesting.

Modules in the form of static classes allow nesting within the same assembly. Example:

static class Math
{
    static class Trig
    {
        // ...
    }
    // ...
}

Namespaces allow for arbitrary nesting, even across assemblies. Example:

// In Assembly1
namespace Foo
{
    // ...
}

// In Assembly2
namespace Foo
{
    namespace Bar
    {
        // ...
    }
    // ...
}

Exporting symbols

C# uses accessibility levels (also known as visibilities) to control access of the symbols to the outside. From the modules point of view, there are 3 important visibilities:

This means that once a module defines something as public, it is exposed to the outside world as-is. Modules can't re-hide parts of the details that were exposed to them in some submodule.

Referencing an external package/module/namespace

As mentioned before, the unit of reuse is an assembly or a NuGet package. Most of the time when using someone elses library, we use it as a NuGet package. This is done by using the package name, that uniquely identifies the published package. We write the package reference inside the .csproj file:

<ItemGroup>
    <PackageReference Include="Contoso.Utility.UsefulStuff" Version="3.6.0" />
</ItemGroup>

Importing symbols

Once a package reference is added to the project, all namespaces within the package are visible to us. We can use the fully-qualified name to refer to the elements. For example, if we have this declaration in the package that we reference:

namespace System.Utilities
{
    public static class Math
    {
        public static int Pow(int x) => x * x;
    }
}

Then we can reference the Math class by writing System.Utilities.Math. Note, that the package name does not necessarily contribute to this reference, making dependencies completely transparent in this sense.

It would be quite inconvenient to always fully qualify the name, so C# has a using directive that adds all elements from the specified namespace to the local namespace. Demonstrating with the previous example:

using System.Utilities;

// Now Math is valid, and is resolved to System.Utilities.Math automatically

C# doesn't allow using a class, so referencing the Pow method is still typed out as Math.Pow. To solve this, C# allows us to import static members of types using the using static feature:

using static System.Utilities.Math;

// Now Pow is valid, and resolves to System.Utilities.Math

There are cases, where a using can cause a conflict, because two used namespaces contain a symbol with the same name. In itself, this does not cause an error, only when trying to reference it. For these cases, there's another construct to alias types, called using alias. Example:

namespace Foo
{
    public class A {}
}

namespace Bar
{
    public class A {}
}

using Foo;
using Bar;

// new A() would cause an ambiguous reference error, we need to alias the one we want to use, or even alias both
using First = Foo.A;
using Second = Bar.A;
// Now First and Second are unambiguous references to the types. The other members inside Foo and Bar are still referencable as-is.

This also means that modules in the form of static classes can also be aliased.

Interesting features

C# allows for type-parameterization of modules through static classes:

public static class Math<T>
{
    public static T Pow(T value) => ...;
}

Personally I've never seen this being used anywhere, as C# generic constraints are a bit too weak to be able to write anything useful with generics this way. This might change with the introduction of static abstracts.

F\

Concept of package/module/namespace

F# is almost the same as C# with one exception: F# has an explicit module feature. This is because F# - unlike C# - allows for free-functions, so to allow interop with most .NET languages, free-functions have to be wrapped inside a static class. F# classes are nothing more, than static classes, but introduced with a different keyword.

While we could consider static classes and modules to be exactly the same, I'd say that the intent coming from naming makes them different enough to say that F# has a "direct modules" feature.

Package/module/namespace open-ness

Packages and namespaces are the same deal as in C#, but modules are not open, not even within the same assembly, like what you can emulate with static partial classes in C#. This means that F# modules are strictly single-file structures.

Subpackages/submodules/subnamespaces

It is equivalent to C#. F# modules being static classes, they allow nesting.

Exporting symbols

F# also uses accessibility levels, public, internal and private being the 3 most important. The only difference is that while C# uses internal by default, for F# the default is public, except for let bindings inside a type, which are always private.

Referencing an external package/module/namespace

The process is virtually identical to C#, except that F# has another package manager, called Paket, which allows using NuGet packages too.

Importing symbols

Again, this is almost identical to C#, except for some minor things. The keyword is open and it has two variants, open and open type. The former allows for importing namespaces and modules, the latter is for static members of types. Note, how F# doesn't differentiate using and using static, if the target is a module (which is essentially a static class).

Interesting features

Interestingly, while F# modules being essentially static classes, it doesn't allow for generic parameters, making them more limited than C# static classes.

Python

Concept of package/module/namespace

First off, Python has a namespace concept, but it's wildly different from the usual definition of namespaces and it's not something a static language should utilize, so I won't consider them here. Other than that, Python has modules and packages.

Modules

Python files are modules themselves. If you have foo.py, that file is also the definition of the foo module. They can contain definitions like usually in languages, but they can also contain executable statements, that are executed on first import.

Packages

Packages are a way to group modules into hierarchies. For example, in Python the module name A.B means a module named B under package A. They are also a way to redistribute code, similarly to NuGet packages.

Package/module/namespace open-ness

Packages and modules are not open for external extensibility in any way.

Subpackages/submodules/subnamespaces

Packages can be arbitrarily nested in each other, and within packages there are flat modules.

Exporting symbols

Python exports everything implicitly, nothing is restricted for access from the outside.

Referencing an external package/module/namespace

Python has no standard way of managing and documenting dependencies for projects, meaning that in the worst case, you have to manually use the package manager pip, and install each package the project you want to run locally. Finding out the package name from the codebase is a story in itself.

There are two nonstandard ways to manage and document dependencies, namely requirement files or the dependency manager poetry.

Importing symbols

Imports are more fine-grained in Python, than in C# or F#. Python either brings in a module under a qualified name, or elements from a module. Take this module definition for example (mod.py):

def hello():
    print('Hello, World!')

def bye():
    print('Bye, World!')

Then from another file in the same folder - like main.py -, you'd have the following options:

mod.hello() mod.bye()

* Import the whole module aliased:
```python
import mod as greeter

greeter.hello()
greeter.bye()

Note, that mod is not in scope here!

mod.hello() would be illegal

hello() bye()

* Import only the element(s) that is(/are) relevant to us aliased:
```python
from mod import hello as hi, bye

hi()
bye()

If you want to import everything from a module, there is a shorthand:

from mod import **

Imports are also transitive, which means that whatever is imported into a module, will be accessible from that module too. For example, let's say we write the following module (helpers.py):

import mod as greeter

Then, in main.py, this is valid:

import helpers

helpers.greeter.hello()

You can think of it as exports being implicit, the imported element becomes part of the module, and is explored implicitly as well.

Interesting features

I'd say that the way Python does imports and their namespace handling as a whole is interesting, but likely not relevant for a static language.

JavaScript and TypeScript

Since the two are likely almost identical, we'll only explicitly talk about JavaScript here. TypeScript is likely equivalent in most features. If I've missed some important difference, please let me know!

Concept of package/module/namespace

JavaScript has no namespaces, but it has packages as the means of code-reuse and distribution, and modules for splitting code across multiple files.

Packages

JavaScript packages are the source files zipped up with a package.json where all the metadata lies (author, name, description, dependencies, ...). There are two big package managers, npm and Yarn. Both work from the same public repository and work with the exact same package format, Yarn can be thought of as a reimplementation of npm.

Modules

Files are implicitly modules, just like in Python. The file canvas.js creates the module canvas.js. Folders can also become modules themselves (details can be found here, but will be explained in short later too).

Package/module/namespace open-ness

Packages and modules are not open on the language-level. Technically there are packages that help you patch other packages, but these are external software acting directly on the source code, which is likely not applicable to a compiled language - at least without considering binary patches.

Subpackages/submodules/subnamespaces

There is no concept of a subpackage. Since folders can act as modules, they can be thought of as parent modules of the contained modules, which are the submodules of the folder.

For a folder to properly act as a module, you need to create a file inside it called index.js. This is also called as a barrel file. The folder can still be part of the hierarchy without it and you can still import modules contained by the folder, but you won't be able to directly import a folder otherwise as a module.

Exporting symbols

Exporting elements in JavaScript modules is completely explicit. When importing elements of a module, only the exported elements can be imported. This is done with the export statement or annotation. It has multiple forms (source):

The last group with aggregate exports is also known as re-exporting. These are essentially what barrel files contain in the index.js files to merge small submodules into a bigger module that you can import as one. For example, if we are developing a math submodule and we have the following structure:

math
  trig.js
  linalg.js
  fourier.js
other_module1
other_module2
...

This might make sense while developing, but while consuming, the user might wants to import math as a whole, without differentiating the different submodules. Without an index.js re-exporting the entire module interface, the user would have to do this:

import * from 'math/trig.js';
import * from 'math/linalg.js';
import * from 'math/fourier.js';

Even worse, you can't bound them to the same name, this causes an error:

import * as math from 'math/trig.js';
import * as math from 'math/linalg.js';
import * as math from 'math/fourier.js';

But if we add this index.js file to the math folder:

export * from 'trig.js';
export * from 'linalg.js';
export * from 'fourier.js';

Then importing it becomes much easier (and importing individual elements is still possible):

import * from 'math';
import * as math from 'math';

Default exports

JavaScript has a special element, called default that modules can export. Each module can export exactly one default element. The ways the user can export it is pretty similar to regular exports (in one snippet here for brevity, but still, only one default export per module):

export default class Greeter { ... }
export default class { ... } // Name is not required
export default function hello() { ... }
export default function() { ... } // Name is not required
export { Greeter as default };

Default exports simplify importing, when the module only wants to expose a single element. The importing part will be shown in the relevant section.

Referencing an external package/module/namespace

Dependency management is fairly easy, all project dependencies are inside package.json.

One interesting thing, is that JavaScript differentiate regular dependencies and development dependencies (devDependencies field). The latter is not part of a published package, but is used for dependencies needed during development, like transpilers or unit testing frameworks. This makes package.json somewhere between a .NET project and a solution, as it attempts to pick up more responsibilities, than a single .NET project, that has a fixed set of dependencies.

More information can be found here.

Importing symbols

Importing individual elements is done through the import statement. For the examples, we will use this greetings.js module:

export function hello() {}
export class Greeter {}
export var x = 0;

The available forms (source):

Default imports

Default imports can be either done with regular imports, using the default name, or using the specialized syntax (note the lack of curly braces):

import someDefault from 'module';

This imports the default element from module and names it someDefault.

Rust

Concept of package/module/namespace

Rust has packages, crates and modules, no direct concept of namespaces.

Crates

Crates are a single compilation unit in Rust that compiles into a binary. Unlike in C, crates can consist of multiple files and have dependencies to other crates or packages. In this sense, Rust crates are like .NET projects, consisting of multiple files, having dependencies and produce a binary as their output. Crates describe themselves (name, author, version, dependencies, compilation config, ...) in Cargo.toml, making them essentially projectfiles.

Packages

Packages group up one or more crates, providing some set of functionality. They contain a Cargo.toml that describes how you build the package (it can as simple as listing the contained crates). This makes them similar to .NET solutions.

The unit of distribution is crates, not packages, similarly how usually each .NET project compiles into one assembly that we then publish each as their own NuGet package.

Modules

Modules are similar to Python, in a sense, that the file foo.rs will implicitly make that files contents be part of a module called foo. Modules can also be part of a file hierarchy, putting modules into subfolders is a similar case to JavaScript folder-modules.

Package/module/namespace open-ness

Packages, crates and modules are not open by nature.

Subpackages/submodules/subnamespaces

There is no concept of subpackages or sub-crates, but there are submodules. There are two ways a module can declare a submodule:

mod linalg { // ... }

Will create two submodules of `math`, accessed as `math::trig` and `math::linalg`. (Note, that these modules are not public by default, to be able to access them externally, you need to make them private. That will be discussed later, when talking about the exporting mechanism.)
* Have the submodules defined in other files under a folder with the same name as the module itself. For example, having the following file structure:

math

Exporting symbols

Rust uses visibility attributes to allow or disallow elements for external access outside the module.

By default, elements are private, which means that only elements inside the module and its submodules can access it. This philosophy likely reflect the fact that a submodule will elaborate on details, so it has all rights to the parent modules private elements.

The other visibility attribute is public, which makes it accessible from outside the module. For example (in math.rs):

pub fn square(n: i32) -> i32 { n * n* }
// Now in main.rs, math::square(5) is valid, as it's accessible

pub can also be customized in a few ways, making it not only meaning "a single kind of accessibility level" (source):

You can re-export features of a module by making the import declaration (discussed later, called use) public. For example, re-exporting trig::sin from math.rs, so it can be called as math::sin:

pub use trig::sin;

Referencing an external package/module/namespace

Referencing a crate is done through specifying it in the Cargo.toml of the crate inside the dependencies section. Example:

[dependencies]
time = "0.1.12"

The format is in name = version. More information can be found here.

Importing symbols

Having visibility on the module requires including it, which is done with the mod declaration:

mod math;

After that, the user can use the module-qualified name of the elements, for example:

fn main() {
    math::sin(3);
}

Alternatively, elements can be brought in to local scope with the use declaration, which is similar to the from module import ... import in JavaScript or Python. It has the following forms:

Go

Concept of package/module/namespace

Go has the concept of packages and modules, no direct concept of namespaces. Importantly, the role of packages and modules swap for Go.

Packages

Go packages are one or more source files under the same folder, tagged with the same package name. Packages get compiled as one unit. One program consists of one or more packages, making them more similar to modules in other languages.

Executable programs must have a main package, that has to contain the main function.

Modules

Modules are collections of packages, with a metadata file, called go.mod in the root. The metadata file describes version information, dependencies, ... This makes them the unit of code distribution (source).

Package/module/namespace open-ness

Go modules are not open by nature. Packages are open within the same folder, two different source files can declare themselves to be in the same package, as long as they are in the same folder.

Subpackages/submodules/subnamespaces

There is no concept of submodules in Go. Packages can form a hierarchy, since you can put packages in folders and you can put other packages and folders within those folders.

Exporting symbols

Elements within the same package can access each other without explicit notation. Packages can export their elements by starting them with an uppercase letter. Any other starting character will make that element private to the module. Example:

package user

// These are private, only accessible within the user package
type passport struct { /* ... */ }
func retrievePassport() passport { /* ... */ }

// These are public, accessible for anyone importing the user package
type Person struct { /* ... */ }
func GetAge() int { /* ... */ }

Referencing an external package/module/namespace

Modules dependencies can be added through the go.mod file.

Importing symbols

Go allows importing packages, no finer grained import is available. For demonstration, we'll use the following package definition:

package math

func Square(x int) int {
    return x * x;
}

Import has 4 forms:

Imports can be grouped up in parenthesis:

import (
    "fmt"
    m "math"
    . "utils"
    _ "unused"
)

If there is a folder named foo, then import foo will import the package foo from the folder with the same name. This might remind one of field punning.

Interesting features

Go allows to write package initializers called init, that will run before the main function does. Packages can even have multiple initializers, that will run in lexical order. Example:

package main

import "fmt"

func init() {
    fmt.Println("Init, World!")
}

func main() {
    fmt.Println("Hello, World!")
}

This will print:

Init, World!
Hello, World!

Carbon

While being a relatively new language, we can rely on the official design documentation to grasp the basics. It's likely I've made a mistake here. If you catch one, please do correct me!

Concept of package/module/namespace

Since Carbon aims to be the successor of C++, namespaces are a given. Other than that, quoting from the design docs:

This means that packages contain libraries.

Each library must have exactly one api file, but can have multiple impl files. This file includes declarations for all public names of the library. Definitions for those declarations must be in some file in the library, either the api file or an impl file.

Every package has its own namespace. This means libraries within a package need to coordinate to avoid name conflicts, but not across packages.

Namespaces are a complete side-story in Carbon. Their only purpose is prefixing elements in a package, not interacting with packages or libraries in any meaningful way. For this reason I won't include any sample code. For that, please refer to this section of the design documentation.

Package declaration

The syntax of package declaration is as follows:

package PackageName library "LibraryName" api/impl;

The package name is optional, omitting it means that the file contributes to the default package. The library specification can be omitted as well, which means that the file contributes to the default library. The keyword api and impl determine if the file contains API or implementation. Examples:

Package/module/namespace open-ness

Packages and libraries are likely open within the codebase, making it impossible to extend, once compiled to a binary. Namespaces - coming from C++ - are naturally open, even across projects/binaries.

Subpackages/submodules/subnamespaces

Namespaces are - by nature - hierarchic, as in C++. There is no concept of subpackages or sublibraries, packages contain a flat set of libraries within.

Exporting symbols

How does a package/module/namespace export a symbol (a function, a type, ...) to be used outside of itself? Does it only export it to the parent or publishes it project-wide? Does it export it under the same name, or can you alias it?

Referencing an external package/module/namespace

TODO

Importing symbols

Every impl file implicitly imports the api of the library. Other library APIs have to be imported with the import declaration. It has the following forms:

OCaml, ML in general

Since OCaml (and ML in general) is very different from the other languages in terms of module systems, I won't consider it with the "regular" languages and I won't follow the template defined above. It is important, that while F# is an ML-derivative, its module system is very different from the rest of the ML family, probably because of the constraints .NET introduced.

Basics

The following table (mostly quoted from the mentioned course) summarizes how language elements of a classic, procedural OO language like Java and an ML-derivative like OCaml map to certain concepts. While they are not equivalent in all cases, it's a good initial mindset for easing into the concepts:

Java# OCaml
Namespaces Packages and classes Structures
Interfaces Interfaces Signatures
Encapsulation Visibility Abstract types
Code reuse Polymorphism, inheritance Functors, includes

The primary way to wrap up code that belong together in OCaml is by defining a struct, which is a module value. For example, we can define a mathematical module like so:

module Math = struct
    let add1 x = x + 1
    let square x = x * x
end

The meaning of this is Math is a module value, which is assigned the structure (simply the construction of a module is called this) specified between struct ... end. Note, that there is no construction runtime, all module operations are purely compile-time, despite the name. The following C# code would be roughly equivalent:

static class Math
{
    public static int add1(int x) => x + 1;
    public static int square(int x) => x * x;
}

Invoking the code from OCaml means prefixing the functions with the module name:

let four = Math.squate 2

OCaml modules can contain function definitions, value definitions, type definitions and other module definitions. This structure looks like an assignment in the form of name = value, so it might not be surprising, that ML modules have a type too. The type of the module above can be written in the following way:

sig
    val add1: int -> int
    val square: int -> int
end

Note, that these module values can not be mixed with regular OCaml values - like the ones you introduce with a let binding. The same applies to module types and regular types. They are both purely compile-time entities, incompatible with regular values.

You can specify the module type after the name, separated with the usual colon:

module Math : sig
    val add1: int -> int
    val square: int -> int
end = struct
    let add1 x = x + 1
    let square x = x * x
end

To make this a bit nicer to read, we can factor out the module type and bind it to a name:

module type MATH = sig
    val add1: int -> int
    val square: int -> int
end

module Math : MATH = struct
    let add1 x = x + 1
    let square x = x * x
end

When you don't specify the type of a module, it will automatically be inferred for the binding. When you explicitly specify it, OCaml will check, if the expected signatures are present in the module definition. You can have more module elements than the one expected by the signature, but the ones defined in the signature are required, causing a compile-error otherwise. For example:

(* This is OK *)
module Math : MATH = struct
    let add1 x = x + 1
    let square x = x * x
    let add2 x = x + 2
end

(* ERROR, missing val add1: int -> int *)
module Math : MATH = struct
    let square x = x * x
    let add2 x = x + 2
end

This explicit type specification will have significance later.

Visibility, encapsulation

Explicit signature types can help hiding certain members of the module. For example, in this example (using the previously defined MATH module type):

module Math : MATH = struct
    let add1 x = x + 1
    let square x = x * x
    let add2 x = x + 2
end

The member add2 is not accessible through Math, since it's not a constraint listed in MATH. This is very similar to when we look at an instance through an interface type, and only the interface types are accessible statically:

interface MATH
{
    public int Add1(int x);
    public int Square(int x);
}

class MathImpl : IMath
{
    public int Add1(int x) => x + 1;
    public int Square(int x) => x * x;
    public int Add2(int x) => x + 2;
}

public static MATH Math = new MathImpl();
// Math can only statically access Add1 and Square, Add2 is hidden

Note, that there's no runtime polymorphism or dynamic dispatch in case of OCaml, this is all compile-time.

Sometimes, we want to check the correctness of a module signature, but we don't want to hide the details. In that case, the check can be done in a separate module alias:

module Math = struct ... end

(* This name alias is only here for checking conformance to the MATH signature *)
module MathCheck : MATH = Math

Associated types and modules

As mentioned before, it's perfectly legal to define types and other modules inside inside a module:

module Math = struct
    type number = int

    module Numeric = struct
        let square x = x * x
    end
end

And these can also be expressed in the module type too:

sig
    type number = int

    module Numeric : sig
        val square: int -> int
    end
end

This is especially useful, when a module needs to do an abstraction over some type or types, but the exact type should be hidden. Let's say, we wanted to build a set. Let's define our module like so, using a list for simplicity:

module Set = struct
    (* Empty set *)
    let empty = []
    (* Is member *)
    let mem x xs = List.mem x xs
    (* Adding an element *)
    let add xs x =
        if mem x xs then xs
        else x :: xs
end

Since we are working with immutable data structures and free functions, the underlying value - list in this case - is taken as an argument and the new underlying value - the modified list - is returned on a modifying operation. The equivalent C# code could be something like (using arrays instead of linked lists for simplicity):

static class Set
{
    public static readonly T[] Empty<T>() => Array.Empty<T>();
    public static bool Mem<T>(T value, T[] set) => set.Contains(value);
    public static T[] Add<T>(T[] set, T value) => Mem(value, set)
        ? set
        : set.Append(value).ToArray();
}

You might wonder, how the OCaml code became generic, because there is no sign of generics anywhere. This is just the fact that ML languages usually generalize types as much as they can, and these functions we wrote - unsurprisingly - happen to work on any element type. Looking at the implementation, it's pretty simple, but inefficient. It would be nice, if we wrote an abstraction for sets, so anyone could write their more efficient or tweaked set implementations. Let's try to do that:

module type SET = sig
    val empty: 'a list
    val mem: 'a -> 'a list -> bool
    val add: 'a list -> 'a -> 'a list
end

There is a huge problem with this: we have constrained the implementation type to be a list of the elements. This is not great, we need a way to hide the fact, that some sets will use a list as an underlying type, but some won't. We could try to make the underlying type generic:

module type SET = sig
    val empty: 'underlying
    val mem: 'a -> 'underlying -> bool
    val add: 'underlying -> 'a -> 'underlying
end

Let's see what the compiler has to say!

Error: Signatire mismatch:
Modules do not match:
    sig
        val empty: 'a list
        val mem: 'a -> 'a list -> bool
        val add: 'a list -> 'a -> 'a list
    end
is not included in
    SET
Values do not match:
    val empty : 'a list
is not included in
    val empty : 'underlying

The compiler doesn't like that we tried to be way more constraining than what we have promised in the signature! This is similar to doing this in C# (which is of course also a compile-error):

interface ISet
{
    public Underlying Empty<Underlying>();
    public bool Mem<T, Underlying>(T value, Underlying set);
    public Underlying Add<Underlying, T>(Underlying set, T value);
}
class Set : ISet
{
    public T[] Empty<T>() => Array.Empty<T>();
    public bool Mem<T>(T value, T[] set) => set.Contains(value);
    public T[] Add<T>(T[] set, T value) => Mem(value, set)
        ? set
        : set.Append(value).ToArray();
}

TODO: Solution, associated types TODO: with constraint TODO: Parallel with Rust traits

Functors, higher-order modules

TODO

Personal thoughts

These are completely personal opinions on various topics. Let me know if you particularly agree or disagree with something. While this is just opinion, I'll try to provide some reasoning alongside.

.NET vs the rest of the languages

The way .NET languages work are very different from the rest of the languages. This is likely because .NET went with namespaces instead of modules. While this makes sense as this means that multiple assemblies can be developed independently but still expose a consistent API at the same time under the same root namespace, I believe there are also some significant drawbacks. Namely, there is zero structure or hierarchy enforcement in .NET. Any random part of the code can contribute to any other. This makes enforcing a good code structure and hierarchy much harder, as technically any code structure could map to any part of the namespace hierarchy. Since it's already a soft convention to follow the folder hierarchy with namespaces, I wonder if this is something that should be enforced to some amount.

Visibility vs explicit import/export

Visibility is a simple, but relatively inflexible mechanism. It usually means that only a fixed, defined subset of the codebase can access a given element. I believe that some very important visibilities are missing that would be useful, and some are useless when not doing strictly object-oriented programming. For example, looking at the visibilities of C# (not really blaming the language design, as it's an OOP language by nature):

I believe that some more fundamental visibilities are missing, like "accessible within this file", or "in this namespace", ... But more importantly, a finite set of access modifiers will always be a relatively rigid system with minor annoyances. Wrapping up a few packages under a common API and hiding the original lower level APIs is harder (maybe even impossible).

An export system is finer grained in the sense that you can propagate the exported parts up the module hierarchy, leaving behind parts that should be private to that subtree. I'd say exports are not more cumbersome at the definition site, they are not worse than simply specifying visibility. Where exports become more cumbersome is when symbols are propagated up from sub-modules to the parent modules, exposing the API towards the "higher level". This is what happens in JS/TS barrel files too, and they are essentially describing the module API in a more explicit manner.

On more interesting module features, ML modules

TODO

jl0pd commented 2 years ago

Generic static classes are used sometimes, for example System.Collections.Generic.EqualityComparer<T>. Very interesting use case for them is high performance caching: https://stackoverflow.com/a/42437504/10339675. Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class StructuralEqualityComparer<T>.Instance, instead of extending default class with EqualityComparer<T>.Structural. Some libraries introduce EnumerableEx, instead of extending default Enumerable

LPeter1997 commented 2 years ago

Generic static classes are used sometimes, for example System.Collections.Generic.EqualityComparer<T>. Very interesting use case for them is high performance caching: https://stackoverflow.com/a/42437504/10339675.

True, I'll add this one, great catch!

Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class StructuralEqualityComparer<T>.Instance, instead of extending default class with EqualityComparer<T>.Structural. Some libraries introduce EnumerableEx, instead of extending default Enumerable

I'm not sure I 100% follow what you mean here. To my understanding, neither C#, not F# allow extending static classes. Unless I've misread F# somewhere. After a quick search, module extensions are talked about, but the docs mention no such thing, at least not in the pages I've looked. If F# does allow extending a module from another assembly, then my following statement is false (unless I've misunderstood what you meant):

Packages and namespaces are the same deal as in C#, but modules are not open, not even within the same assembly, like what you can emulate with static partial classes in C#. This means that F# modules are strictly single-file structures.

magicmouse commented 2 years ago

you investigated various languages that have lousy module systems like Python which is pathetic in this area, but skipped over the Modula-2 language which had the most thoroughly thought out module system ever devised; one that delivered 100:1 compilation speed improvement (by means of avoidance of recompiling things that aren't affected, and compiled definition files), guaranteed fast module dependency scanning (by requiring syntactically that all module imports be defined in the very first tokens), and allowed for opaque pointer types (which allowed a pointer to some block be created and passed around, but not peeked into), and facilitated separate compilation by splitting the definition from implementation.

Further it had a special cross check during linking which would not permit an out of date compiled program to be linked with a newer version of some other dependency. It was a very clever trick to prevent what would be a disastrous mismatch during execution if the definition of the module had changed and subcomponents were compiled against different versions of some library/module.

Instead of requiring a whole special build tool like Ant or Gradle, in milliseconds you could build the dependency tree on each compilation, so no annoying makefiles which can get out of date easily, and in languages like C, can produce incorrect builds

Having used C then Modula-2 on very large commercial products, M2 delivered executables that were half the size, due the higher degree of sharing achievable when modules have a rich declaration capability.

Modern machines are awfully fast, so the ability to separately compile modules and join them together later (with different team members controlling their own set of modules) is perhaps not of great interest any more, but it is tragic that giant sweathog projects like the browsers didn't get to benefit from the well thought out M2 system.

magicmouse commented 2 years ago

The major things to consider are:

1) does your module system permit team component programming? or is it expected that all the source be quiescent at a moment for the system to be built in one motion?

2) what range of symbols can be encoded into a module?

3) what kinds of controls are placed on symbols that are exported. Besides the usual functions, do you have constants, types, records, etc. ? What range of things can be stored in a module file?

4) Can you make something read -only, vs. read-write?

5) are the modules split into 2 parts, or is the compiled definition derived from the implementation but still stored in a separate file (Oberon dod this)? Does the definition get compiled into a binary file for fast access?

6) how is the dependency graph generated? Do you need a special toolchain for this, like Cmake utility, or makefiles/Ant/Gradle, etc. to manage the dependency graph. Is it automatic or manual?

7) Are mistakes possible in a build? Or does the system have crosschecks in some way to prevent an invalid build from being generated?

magicmouse commented 2 years ago

Another thing often ignored in module design, is in what order are the modules initialized? Assuming you have code either at the global level which is executed at the start, or in the case of Modula2 which had an optional section for initialization or finalization in each module; in what order do those get executed?

It would seem to the casual observer, that recursive descent, with bottom-most module being called first, would solve this issue, but what happens when module A calls B, B calls C, and C calls A? Sooner or later, as a program grows to a large size, you will reach a point where cycles naturally form. Even the most perfect module splitting does have cross-links, and they grow geometrically with linear code expansion.

A way of specifying the initialization would be handy for an industrial strength solution.

svick commented 2 years ago

Some small additions, with varying degrees of relevance:

  1. NuGet also has something similar to devDependencies: a package can be marked as a developmentDependency and it will then be referenced something this:

    <PackageReference Include="Foo" Version="1.0.0">
       <PrivateAssets>all</PrivateAssets>
       <IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
    </PackageReference>

    I believe this is most commonly used for Roslyn analyzers and source generators. The *Assets system is quite versatile, but I don't know if it's actually useful. And it's unfortunate that it's so verbose for the most common case (even if the boilerplate is autogenerated by dotnet add package or by VS).

  2. F# has the [<AutoOpen>] attribute, which means that a module or namespace is automatically opened when its container is opened/referenced.

  3. While F# primarily uses modules, you can also declare an [<AbstractClass; Sealed>] type, which is the equivalent of C# static. Such types can then be generic.

  4. .Net also has something it calls "modules", which effectively allow separating assemblies into multiple files. Though modules were very rarely used on .Net Framework, and don't work on .Net Core.

LPeter1997 commented 2 years ago

I'll also add this to the OP soon, thanks!

jl0pd commented 2 years ago

Sorry for delay, finally have ability to type with keyboard, not phone.

Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class StructuralEqualityComparer.Instance, instead of extending default class with EqualityComparer.Structural. Some libraries introduce EnumerableEx, instead of extending default Enumerable

I'm not sure I 100% follow what you mean here

I'm telling about case when you want to have extra members in some static class. Let's continue example with EqualityComparer<T>. This class provides IEqualityComparer implementation for any type with it's property Default. Default equality is done with usage IEquatable<T>.Equals(T) when type implements it or Object.Equals(Object) when it doesn't. This means that type must implement interface in order to have correct equality. But many types doesn't implement IEquatable and therefore compared by reference, which is what many don't want (for example arrays or lists). One way to solve this problem is to provide structural comparer with different property EqualityComparer<T>.Structural, but it's not allowed by CLR to extend class.

So if we don't have ability to change class directly, why not just declare class with same name?

namespace System.Collections.Generic
{
    public static class EqualityComparer<T>
    {
        public static IEqualityComparer<T> Structural => throw null;
    }
}
using System.Collections.Generic;
var comparer = EqualityComparer<int>.Default; // raises csharp(CS0436), The type 'EqualityComparer<T>' in 'c:\Repogit\Test\CS\so\a.cs' conflicts with the imported type 'EqualityComparer<T>' in 'System.Collections, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. Using the type defined in 'c:\Repogit\Test\CS\so\a.cs'
// csharp(CS0117), 'EqualityComparer<int>' does not contain a definition for 'Default' 

C# complains that type with same already exists and uses type defined in current assembly. Member Default is inaccessible.

Alright, maybe change namespace? Let's put our comparer into MyNamespace:

using System.Collections.Generic;
using MyNamespace;
var comparer = EqualityComparer<int>.Default; // csharp(CS0104), 'EqualityComparer<>' is an ambiguous reference between 'MyNamespace.EqualityComparer<T>' and 'System.Collections.Generic.EqualityComparer<T>'

It doesn't work again, now compiler can't decide which one to use.

F# elegantly solves this issue by merging types, declared in different assemblies and allowing to access all members:

module System.Collections.Generic.EqualityComparer
let Structural<'T> = raise null

And let's use it. It now compiles and works as expected

open System.Collections.Generic
EqualityComparer.Default |> ignore
EqualityComparer.Structural |> ignore

Or using extensions:

module [<AutoOpen>] System.Collections.Generic.EqualityComparerExtensions
type EqualityComparer<'T> with
    static member Structural = raise null

Which also compiles. These two approaches are compiled in different way and have own limitations, but still it allows to do what you've desired: put foreign member into class and access it