Closed LPeter1997 closed 1 year ago
Generic static classes are used sometimes, for example System.Collections.Generic.EqualityComparer<T>
. Very interesting use case for them is high performance caching: https://stackoverflow.com/a/42437504/10339675.
Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class StructuralEqualityComparer<T>.Instance
, instead of extending default class with EqualityComparer<T>.Structural
. Some libraries introduce EnumerableEx
, instead of extending default Enumerable
Generic static classes are used sometimes, for example
System.Collections.Generic.EqualityComparer<T>
. Very interesting use case for them is high performance caching: https://stackoverflow.com/a/42437504/10339675.
True, I'll add this one, great catch!
Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class
StructuralEqualityComparer<T>.Instance
, instead of extending default class withEqualityComparer<T>.Structural
. Some libraries introduceEnumerableEx
, instead of extending defaultEnumerable
I'm not sure I 100% follow what you mean here. To my understanding, neither C#, not F# allow extending static classes. Unless I've misread F# somewhere. After a quick search, module extensions are talked about, but the docs mention no such thing, at least not in the pages I've looked. If F# does allow extending a module from another assembly, then my following statement is false (unless I've misunderstood what you meant):
Packages and namespaces are the same deal as in C#, but modules are not open, not even within the same assembly, like what you can emulate with static partial classes in C#. This means that F# modules are strictly single-file structures.
you investigated various languages that have lousy module systems like Python which is pathetic in this area, but skipped over the Modula-2 language which had the most thoroughly thought out module system ever devised; one that delivered 100:1 compilation speed improvement (by means of avoidance of recompiling things that aren't affected, and compiled definition files), guaranteed fast module dependency scanning (by requiring syntactically that all module imports be defined in the very first tokens), and allowed for opaque pointer types (which allowed a pointer to some block be created and passed around, but not peeked into), and facilitated separate compilation by splitting the definition from implementation.
Further it had a special cross check during linking which would not permit an out of date compiled program to be linked with a newer version of some other dependency. It was a very clever trick to prevent what would be a disastrous mismatch during execution if the definition of the module had changed and subcomponents were compiled against different versions of some library/module.
Instead of requiring a whole special build tool like Ant or Gradle, in milliseconds you could build the dependency tree on each compilation, so no annoying makefiles which can get out of date easily, and in languages like C, can produce incorrect builds
Having used C then Modula-2 on very large commercial products, M2 delivered executables that were half the size, due the higher degree of sharing achievable when modules have a rich declaration capability.
Modern machines are awfully fast, so the ability to separately compile modules and join them together later (with different team members controlling their own set of modules) is perhaps not of great interest any more, but it is tragic that giant sweathog projects like the browsers didn't get to benefit from the well thought out M2 system.
The major things to consider are:
1) does your module system permit team component programming? or is it expected that all the source be quiescent at a moment for the system to be built in one motion?
2) what range of symbols can be encoded into a module?
3) what kinds of controls are placed on symbols that are exported. Besides the usual functions, do you have constants, types, records, etc. ? What range of things can be stored in a module file?
4) Can you make something read -only, vs. read-write?
5) are the modules split into 2 parts, or is the compiled definition derived from the implementation but still stored in a separate file (Oberon dod this)? Does the definition get compiled into a binary file for fast access?
6) how is the dependency graph generated? Do you need a special toolchain for this, like Cmake utility, or makefiles/Ant/Gradle, etc. to manage the dependency graph. Is it automatic or manual?
7) Are mistakes possible in a build? Or does the system have crosschecks in some way to prevent an invalid build from being generated?
Another thing often ignored in module design, is in what order are the modules initialized? Assuming you have code either at the global level which is executed at the start, or in the case of Modula2 which had an optional section for initialization or finalization in each module; in what order do those get executed?
It would seem to the casual observer, that recursive descent, with bottom-most module being called first, would solve this issue, but what happens when module A calls B, B calls C, and C calls A? Sooner or later, as a program grows to a large size, you will reach a point where cycles naturally form. Even the most perfect module splitting does have cross-links, and they grow geometrically with linear code expansion.
A way of specifying the initialization would be handy for an industrial strength solution.
Some small additions, with varying degrees of relevance:
NuGet also has something similar to devDependencies
: a package can be marked as a developmentDependency
and it will then be referenced something this:
<PackageReference Include="Foo" Version="1.0.0">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
</PackageReference>
I believe this is most commonly used for Roslyn analyzers and source generators. The *Assets
system is quite versatile, but I don't know if it's actually useful. And it's unfortunate that it's so verbose for the most common case (even if the boilerplate is autogenerated by dotnet add package
or by VS).
F# has the [<AutoOpen>]
attribute, which means that a module or namespace is automatically opened when its container is opened/referenced.
While F# primarily uses module
s, you can also declare an [<AbstractClass; Sealed>]
type, which is the equivalent of C# static
. Such types can then be generic.
.Net also has something it calls "modules", which effectively allow separating assemblies into multiple files. Though modules were very rarely used on .Net Framework, and don't work on .Net Core.
I'll also add this to the OP soon, thanks!
Sorry for delay, finally have ability to type with keyboard, not phone.
Thing that I hate from C# compared to F# is that static classes cannot be extended. For example I want to use structural comparer for collection and have to put it in different class StructuralEqualityComparer
.Instance, instead of extending default class with EqualityComparer .Structural. Some libraries introduce EnumerableEx, instead of extending default Enumerable I'm not sure I 100% follow what you mean here
I'm telling about case when you want to have extra members in some static class. Let's continue example with EqualityComparer<T>
. This class provides IEqualityComparer
implementation for any type with it's property Default
. Default equality is done with usage IEquatable<T>.Equals(T)
when type implements it or Object.Equals(Object)
when it doesn't. This means that type must implement interface in order to have correct equality. But many types doesn't implement IEquatable
and therefore compared by reference, which is what many don't want (for example arrays or lists). One way to solve this problem is to provide structural comparer with different property EqualityComparer<T>.Structural
, but it's not allowed by CLR to extend class.
So if we don't have ability to change class directly, why not just declare class with same name?
namespace System.Collections.Generic
{
public static class EqualityComparer<T>
{
public static IEqualityComparer<T> Structural => throw null;
}
}
using System.Collections.Generic;
var comparer = EqualityComparer<int>.Default; // raises csharp(CS0436), The type 'EqualityComparer<T>' in 'c:\Repogit\Test\CS\so\a.cs' conflicts with the imported type 'EqualityComparer<T>' in 'System.Collections, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. Using the type defined in 'c:\Repogit\Test\CS\so\a.cs'
// csharp(CS0117), 'EqualityComparer<int>' does not contain a definition for 'Default'
C# complains that type with same already exists and uses type defined in current assembly. Member Default
is inaccessible.
Alright, maybe change namespace? Let's put our comparer into MyNamespace
:
using System.Collections.Generic;
using MyNamespace;
var comparer = EqualityComparer<int>.Default; // csharp(CS0104), 'EqualityComparer<>' is an ambiguous reference between 'MyNamespace.EqualityComparer<T>' and 'System.Collections.Generic.EqualityComparer<T>'
It doesn't work again, now compiler can't decide which one to use.
F# elegantly solves this issue by merging types, declared in different assemblies and allowing to access all members:
module System.Collections.Generic.EqualityComparer
let Structural<'T> = raise null
And let's use it. It now compiles and works as expected
open System.Collections.Generic
EqualityComparer.Default |> ignore
EqualityComparer.Structural |> ignore
Or using extensions:
module [<AutoOpen>] System.Collections.Generic.EqualityComparerExtensions
type EqualityComparer<'T> with
static member Structural = raise null
Which also compiles. These two approaches are compiled in different way and have own limitations, but still it allows to do what you've desired: put foreign member into class and access it
Since the modules issue is one of the most important ones as of right now, I've tried to summarize what other major languages have done in this area, as well as trying to read up on some literature about it.
Goal of this document
The goal of this document is to investigate the world of module systems of different languages. This is a topic that is not talked about very much, but it's one of the most fundamental features of a language to aid in code reuse, decomposition, etc. I'd like to go through the small amount of articles that I was able to dig up about this, then visit some languages and how they designed their module systems.
Existing discussions/literature
Before discussing each language, I'd like to go through the existing discussions and literature and how they contribute to the concept of module systems.
A blog post by Jonathan Goodwin
Link to the relevant blog post.
It mainly talks about the two main flavors of modules:
ML-inspired modules: Extensions that ML made, like parametric modules and even mentions the possibility of first-class modules.
It also mentions how some languages make a one-to-one correspondence between files, modules and compilation units. The languages, where a module is defined within a single file usually allow some mechanism to merge multiple modules together, achieving the same thing as allowing a module defined in multiple source files.
The post then talks about the similarities between the idea of a module and types:
"What next?", a post on next steps in language design
Link to the post.
It's a single significant paragraph that essentially agrees with the "Modules Matter Most" presentation (we'll discuss that one later) and gives a few "components" without any explanation:
Sadly, the author gave no real information or examples on these outside the single reference to Modules Matter Most, which we will discuss later.
It also claims that most languages mean simply "a way of managing namespaces or compilation order" when they talk about module systems.
Relevant discussion on Reddit
Link to the discussion.
This is a summary by Jonathan Goodwin on the previous, "What's Next?" post, where they summarized an explanation for the key "components" that weren't provided in the post originally. The explanations rely on heavy theory, I'd advise checking out the post and the discussion only if you are interested.
In summary, I believe that many features try to lift modules to be on the same level of features as types: they can implement interfaces, they can do polymorphism, they can be instantiated with types or values, you could have functions returning modules, you could have modules passed in as values, ...
I believe this "roadmap" is not surprising, given that even C# uses things like static classes, which is already a type we usually think of as a module. Making them an instance would essentially lift modules to value level, giving us first-class modules (that would likely have the only limitation compared to regular classes + instances that it would be completely immutable).
Modules Matter Most by Robert Harper
Link to the slides.
In the beginning they claim that "the ML module system will be the standard other systems will be compared to, and is something that not easily can be improved on (and is often actually made worse)" [slide 4]. Then it goes on to detail its key ideas.
The first part of the slides (up until around slide 14) are not too bad, after that it gets a bit theory-heavy. Feel free to read it, but again, my feeling again is that modules are being brought closer to types and values.
Practical Foundations for Programming Languages by Robert Harper
Abbreviated version by the author on this university subject site.
It's heavier on the theory side, mentioned for the completeness' sake. Since the author is the same as before, unsurprisingly they push the same ideas as the ML module system. They just happen to use a lot more "functional jargon" that makes the concepts sound scary and hard for the average developer to understand.
An SO discussion about first-class modules
The SO question.
The top answer very strongly hints what I've hinted at previously: ML simply attempts to bring modules and regular values closer together, with some restrictions of course. Namely, modules are usually immutable.
How Standard ML modules map to Scala objects (Draft)
The blog post.
This post was written by someone likely way more seasoned in ML modules than me, but essentially proved my gut feeling about ML modules being just "classes and instances behind the scenes".
OCaml Programming: Correct + Efficient + Beautiful, chapter 5
Chapter of the online book/course.
It's a great introductory material for an ML-like module system. In this case, it's OCaml. The course makes the parallel that I have been making between ML-modules and OO constructs. The section about the ML module system will be mostly based on this material.
That's it?
Unfortunately, that's all the literature I could find. Most of it is on the more academic, ML-side of things. If there's anything else, please let me know, so I can extend this document accordingly.
Aspects/features for inspection
There are quite a few aspects/features to consider when looking at how a language defines its module system. Here I'd like to give a breakdown of how we'll look at each language to be relatively methodical and consistent. These aspect will be quite different from aspects coming from the literature. This is because we are looking for a more "practical" and lower-level overview. We can still get inspired by the ML module system in some aspects, if we want it.
Terminology
When talking about module systems, there are various terms floating around, the main 3 being:
While these can be defined as separate entities meaning different things, some languages blur the lines between them, merge them or remove them completely. For completeness' sake, let's look at a possible differentiation of these concepts:
These "definitions" also give us the three main goals of a module system, which are:
Discussed features
Here I'd like to shortly describe the different aspects we'll use for the different languages.
Template
If you want to contribute by investigating a language, feel free to use this template as a starter.
Investigated languages
C\
Concept of package/module/namespace
C# has two of the three concepts directly: packages and namespaces.
Packages
Packages come in two forms: Assemblies and NuGet Packages. Assemblies are simply the compiled DLL produced by the compiler. You can reference such a DLL and use it no problem. NuGet packages append some metadata to this assembly and zip it up so you can publish it in package repositories online. While NuGet packages can contain anything (binaries, analyzers, raw resources), in terms of code reuse, it's just the compiled assembly zipped up with metadata.
Modules
While C# doesn't have modules directly as a feature, we often treat namespaces or static classes as modules. Example:
Namespaces
By convention, we declare most of our source files within some namespace. Namespaces are hierarchical, and by convention we usually follow the directory structure with the namespace naming. Note, that unlike Java this is just a convention, not a requirement. For example, if our project is named
Game.Utilities
and you have a file inside it at the pathMath/Interpolations/Slerp.cs
, you'd put it in the namespaceGame.Utilities.Math.Interpolations
by convention, but nothing stops you from putting it inFoo.Bar.Baz
.C# also has two syntaxes for namespaces. Initially, there was only the braced syntax:
Since most people only declared one namespace per file, the indentation felt like wasted space, so file-scoped namespaces were introduced:
Package/module/namespace open-ness
Packages, by nature are not open, they are closed, compiled binaries.
Modules in the form of static classes can be extended within the same assembly, if the static class is marked with the partial modifier.
Namespaces are completely open, any project can declare anything in any namespace as long as they don't cause a name-collision.
Subpackages/submodules/subnamespaces
Packages allow no nesting.
Modules in the form of static classes allow nesting within the same assembly. Example:
Namespaces allow for arbitrary nesting, even across assemblies. Example:
Exporting symbols
C# uses accessibility levels (also known as visibilities) to control access of the symbols to the outside. From the modules point of view, there are 3 important visibilities:
public
: Visible from everywhere.internal
: Only visible within the assembly.private
: Only visible within the class, so it's essentially only useful in static classes for modules.This means that once a module defines something as public, it is exposed to the outside world as-is. Modules can't re-hide parts of the details that were exposed to them in some submodule.
Referencing an external package/module/namespace
As mentioned before, the unit of reuse is an assembly or a NuGet package. Most of the time when using someone elses library, we use it as a NuGet package. This is done by using the package name, that uniquely identifies the published package. We write the package reference inside the
.csproj
file:Importing symbols
Once a package reference is added to the project, all namespaces within the package are visible to us. We can use the fully-qualified name to refer to the elements. For example, if we have this declaration in the package that we reference:
Then we can reference the
Math
class by writingSystem.Utilities.Math
. Note, that the package name does not necessarily contribute to this reference, making dependencies completely transparent in this sense.It would be quite inconvenient to always fully qualify the name, so C# has a using directive that adds all elements from the specified namespace to the local namespace. Demonstrating with the previous example:
C# doesn't allow
using
a class, so referencing thePow
method is still typed out asMath.Pow
. To solve this, C# allows us to import static members of types using theusing static
feature:There are cases, where a
using
can cause a conflict, because two used namespaces contain a symbol with the same name. In itself, this does not cause an error, only when trying to reference it. For these cases, there's another construct to alias types, called using alias. Example:This also means that modules in the form of static classes can also be aliased.
Interesting features
C# allows for type-parameterization of modules through static classes:
Personally I've never seen this being used anywhere, as C# generic constraints are a bit too weak to be able to write anything useful with generics this way. This might change with the introduction of static abstracts.
F\
Concept of package/module/namespace
F# is almost the same as C# with one exception: F# has an explicit module feature. This is because F# - unlike C# - allows for free-functions, so to allow interop with most .NET languages, free-functions have to be wrapped inside a static class. F# classes are nothing more, than static classes, but introduced with a different keyword.
While we could consider static classes and modules to be exactly the same, I'd say that the intent coming from naming makes them different enough to say that F# has a "direct modules" feature.
Package/module/namespace open-ness
Packages and namespaces are the same deal as in C#, but modules are not open, not even within the same assembly, like what you can emulate with static partial classes in C#. This means that F# modules are strictly single-file structures.
Subpackages/submodules/subnamespaces
It is equivalent to C#. F# modules being static classes, they allow nesting.
Exporting symbols
F# also uses accessibility levels,
public
,internal
andprivate
being the 3 most important. The only difference is that while C# uses internal by default, for F# the default ispublic
, except forlet
bindings inside a type, which are alwaysprivate
.Referencing an external package/module/namespace
The process is virtually identical to C#, except that F# has another package manager, called Paket, which allows using NuGet packages too.
Importing symbols
Again, this is almost identical to C#, except for some minor things. The keyword is open and it has two variants,
open
andopen type
. The former allows for importing namespaces and modules, the latter is for static members of types. Note, how F# doesn't differentiateusing
andusing static
, if the target is a module (which is essentially a static class).Interesting features
Interestingly, while F# modules being essentially static classes, it doesn't allow for generic parameters, making them more limited than C# static classes.
Python
Concept of package/module/namespace
First off, Python has a namespace concept, but it's wildly different from the usual definition of namespaces and it's not something a static language should utilize, so I won't consider them here. Other than that, Python has modules and packages.
Modules
Python files are modules themselves. If you have
foo.py
, that file is also the definition of thefoo
module. They can contain definitions like usually in languages, but they can also contain executable statements, that are executed on first import.Packages
Packages are a way to group modules into hierarchies. For example, in Python the module name
A.B
means a module namedB
under packageA
. They are also a way to redistribute code, similarly to NuGet packages.Package/module/namespace open-ness
Packages and modules are not open for external extensibility in any way.
Subpackages/submodules/subnamespaces
Packages can be arbitrarily nested in each other, and within packages there are flat modules.
Exporting symbols
Python exports everything implicitly, nothing is restricted for access from the outside.
Referencing an external package/module/namespace
Python has no standard way of managing and documenting dependencies for projects, meaning that in the worst case, you have to manually use the package manager pip, and install each package the project you want to run locally. Finding out the package name from the codebase is a story in itself.
There are two nonstandard ways to manage and document dependencies, namely requirement files or the dependency manager poetry.
Importing symbols
Imports are more fine-grained in Python, than in C# or F#. Python either brings in a module under a qualified name, or elements from a module. Take this module definition for example (
mod.py
):Then from another file in the same folder - like
main.py
-, you'd have the following options:mod.hello() mod.bye()
Note, that mod is not in scope here!
mod.hello() would be illegal
hello() bye()
If you want to import everything from a module, there is a shorthand:
Imports are also transitive, which means that whatever is imported into a module, will be accessible from that module too. For example, let's say we write the following module (
helpers.py
):Then, in
main.py
, this is valid:You can think of it as exports being implicit, the imported element becomes part of the module, and is explored implicitly as well.
Interesting features
I'd say that the way Python does imports and their namespace handling as a whole is interesting, but likely not relevant for a static language.
JavaScript and TypeScript
Since the two are likely almost identical, we'll only explicitly talk about JavaScript here. TypeScript is likely equivalent in most features. If I've missed some important difference, please let me know!
Concept of package/module/namespace
JavaScript has no namespaces, but it has packages as the means of code-reuse and distribution, and modules for splitting code across multiple files.
Packages
JavaScript packages are the source files zipped up with a
package.json
where all the metadata lies (author, name, description, dependencies, ...). There are two big package managers, npm and Yarn. Both work from the same public repository and work with the exact same package format, Yarn can be thought of as a reimplementation of npm.Modules
Files are implicitly modules, just like in Python. The file
canvas.js
creates the modulecanvas.js
. Folders can also become modules themselves (details can be found here, but will be explained in short later too).Package/module/namespace open-ness
Packages and modules are not open on the language-level. Technically there are packages that help you patch other packages, but these are external software acting directly on the source code, which is likely not applicable to a compiled language - at least without considering binary patches.
Subpackages/submodules/subnamespaces
There is no concept of a subpackage. Since folders can act as modules, they can be thought of as parent modules of the contained modules, which are the submodules of the folder.
For a folder to properly act as a module, you need to create a file inside it called
index.js
. This is also called as a barrel file. The folder can still be part of the hierarchy without it and you can still import modules contained by the folder, but you won't be able to directly import a folder otherwise as a module.Exporting symbols
Exporting elements in JavaScript modules is completely explicit. When importing elements of a module, only the exported elements can be imported. This is done with the export statement or annotation. It has multiple forms (source):
The last group with aggregate exports is also known as re-exporting. These are essentially what barrel files contain in the
index.js
files to merge small submodules into a bigger module that you can import as one. For example, if we are developing a math submodule and we have the following structure:This might make sense while developing, but while consuming, the user might wants to import math as a whole, without differentiating the different submodules. Without an
index.js
re-exporting the entire module interface, the user would have to do this:Even worse, you can't bound them to the same name, this causes an error:
But if we add this
index.js
file to themath
folder:Then importing it becomes much easier (and importing individual elements is still possible):
Default exports
JavaScript has a special element, called
default
that modules can export. Each module can export exactly one default element. The ways the user can export it is pretty similar to regular exports (in one snippet here for brevity, but still, only one default export per module):Default exports simplify importing, when the module only wants to expose a single element. The importing part will be shown in the relevant section.
Referencing an external package/module/namespace
Dependency management is fairly easy, all project dependencies are inside
package.json
.One interesting thing, is that JavaScript differentiate regular dependencies and development dependencies (
devDependencies
field). The latter is not part of a published package, but is used for dependencies needed during development, like transpilers or unit testing frameworks. This makespackage.json
somewhere between a .NET project and a solution, as it attempts to pick up more responsibilities, than a single .NET project, that has a fixed set of dependencies.More information can be found here.
Importing symbols
Importing individual elements is done through the
import
statement. For the examples, we will use thisgreetings.js
module:The available forms (source):
Default imports
Default imports can be either done with regular imports, using the
default
name, or using the specialized syntax (note the lack of curly braces):This imports the default element from
module
and names itsomeDefault
.Rust
Concept of package/module/namespace
Rust has packages, crates and modules, no direct concept of namespaces.
Crates
Crates are a single compilation unit in Rust that compiles into a binary. Unlike in C, crates can consist of multiple files and have dependencies to other crates or packages. In this sense, Rust crates are like .NET projects, consisting of multiple files, having dependencies and produce a binary as their output. Crates describe themselves (name, author, version, dependencies, compilation config, ...) in
Cargo.toml
, making them essentially projectfiles.Packages
Packages group up one or more crates, providing some set of functionality. They contain a
Cargo.toml
that describes how you build the package (it can as simple as listing the contained crates). This makes them similar to .NET solutions.The unit of distribution is crates, not packages, similarly how usually each .NET project compiles into one assembly that we then publish each as their own NuGet package.
Modules
Modules are similar to Python, in a sense, that the file
foo.rs
will implicitly make that files contents be part of a module calledfoo
. Modules can also be part of a file hierarchy, putting modules into subfolders is a similar case to JavaScript folder-modules.Package/module/namespace open-ness
Packages, crates and modules are not open by nature.
Subpackages/submodules/subnamespaces
There is no concept of subpackages or sub-crates, but there are submodules. There are two ways a module can declare a submodule:
math.rs
doing:mod linalg { // ... }
math
(Same note applies, these would have to be declared public for external accessibility)
Exporting symbols
Rust uses visibility attributes to allow or disallow elements for external access outside the module.
By default, elements are private, which means that only elements inside the module and its submodules can access it. This philosophy likely reflect the fact that a submodule will elaborate on details, so it has all rights to the parent modules private elements.
The other visibility attribute is public, which makes it accessible from outside the module. For example (in
math.rs
):pub
can also be customized in a few ways, making it not only meaning "a single kind of accessibility level" (source):pub
: accessible for everyone outside the modulepub(in path)
: accessible for everyone within the specified path, for examplepub(in crate::foo::bar)
pub(crate)
: accessible for everyone within the cratepub(self)
: equivalent to private andpub(in self)
, only accessible within the module and descendantspub(super)
: equivalent topub (in super)
, only accessible in parent module and descendantsYou can re-export features of a module by making the import declaration (discussed later, called
use
) public. For example, re-exportingtrig::sin
frommath.rs
, so it can be called asmath::sin
:Referencing an external package/module/namespace
Referencing a crate is done through specifying it in the
Cargo.toml
of the crate inside thedependencies
section. Example:The format is in
name = version
. More information can be found here.Importing symbols
Having visibility on the module requires including it, which is done with the
mod
declaration:After that, the user can use the module-qualified name of the elements, for example:
Alternatively, elements can be brought in to local scope with the
use
declaration, which is similar to thefrom module import ...
import in JavaScript or Python. It has the following forms:Go
Concept of package/module/namespace
Go has the concept of packages and modules, no direct concept of namespaces. Importantly, the role of packages and modules swap for Go.
Packages
Go packages are one or more source files under the same folder, tagged with the same package name. Packages get compiled as one unit. One program consists of one or more packages, making them more similar to modules in other languages.
Executable programs must have a
main
package, that has to contain themain
function.Modules
Modules are collections of packages, with a metadata file, called
go.mod
in the root. The metadata file describes version information, dependencies, ... This makes them the unit of code distribution (source).Package/module/namespace open-ness
Go modules are not open by nature. Packages are open within the same folder, two different source files can declare themselves to be in the same package, as long as they are in the same folder.
Subpackages/submodules/subnamespaces
There is no concept of submodules in Go. Packages can form a hierarchy, since you can put packages in folders and you can put other packages and folders within those folders.
Exporting symbols
Elements within the same package can access each other without explicit notation. Packages can export their elements by starting them with an uppercase letter. Any other starting character will make that element private to the module. Example:
Referencing an external package/module/namespace
Modules dependencies can be added through the
go.mod
file.Importing symbols
Go allows importing packages, no finer grained import is available. For demonstration, we'll use the following package definition:
Import has 4 forms:
Imports can be grouped up in parenthesis:
If there is a folder named
foo
, thenimport foo
will import the packagefoo
from the folder with the same name. This might remind one of field punning.Interesting features
Go allows to write package initializers called
init
, that will run before themain
function does. Packages can even have multiple initializers, that will run in lexical order. Example:This will print:
Carbon
While being a relatively new language, we can rely on the official design documentation to grasp the basics. It's likely I've made a mistake here. If you catch one, please do correct me!
Concept of package/module/namespace
Since Carbon aims to be the successor of C++, namespaces are a given. Other than that, quoting from the design docs:
This means that packages contain libraries.
Each library must have exactly one
api
file, but can have multipleimpl
files. This file includes declarations for all public names of the library. Definitions for those declarations must be in some file in the library, either theapi
file or animpl
file.Every package has its own namespace. This means libraries within a package need to coordinate to avoid name conflicts, but not across packages.
Namespaces are a complete side-story in Carbon. Their only purpose is prefixing elements in a package, not interacting with packages or libraries in any meaningful way. For this reason I won't include any sample code. For that, please refer to this section of the design documentation.
Package declaration
The syntax of package declaration is as follows:
The package name is optional, omitting it means that the file contributes to the default package. The library specification can be omitted as well, which means that the file contributes to the default library. The keyword
api
andimpl
determine if the file contains API or implementation. Examples:package Math library "Matrix" api;
: The file defines the API of the library"Matrix"
in the packageMath
.package library "Matrix" impl;
: The file contributes to the implementation of the library"Matrix"
in the default package.package Math impl;
: The file contributes to the implementation of the default library in packageMath
.package impl;
: The file contributes to the implementation of the default library in the default package.Package/module/namespace open-ness
Packages and libraries are likely open within the codebase, making it impossible to extend, once compiled to a binary. Namespaces - coming from C++ - are naturally open, even across projects/binaries.
Subpackages/submodules/subnamespaces
Namespaces are - by nature - hierarchic, as in C++. There is no concept of subpackages or sublibraries, packages contain a flat set of libraries within.
Exporting symbols
How does a package/module/namespace export a symbol (a function, a type, ...) to be used outside of itself? Does it only export it to the parent or publishes it project-wide? Does it export it under the same name, or can you alias it?
Referencing an external package/module/namespace
TODO
Importing symbols
Every
impl
file implicitly imports theapi
of the library. Other library APIs have to be imported with theimport
declaration. It has the following forms:import PackageName library "LibraryName";
: Import the library"LibraryName"
from the packagePackageName
.import PackageName;
: Import the default library from the packagePackageName
.import library "LibraryName";
: Import the library"LibraryName"
from the same package.import library default;
: Import the default library from the same package.OCaml, ML in general
Since OCaml (and ML in general) is very different from the other languages in terms of module systems, I won't consider it with the "regular" languages and I won't follow the template defined above. It is important, that while F# is an ML-derivative, its module system is very different from the rest of the ML family, probably because of the constraints .NET introduced.
Basics
The following table (mostly quoted from the mentioned course) summarizes how language elements of a classic, procedural OO language like Java and an ML-derivative like OCaml map to certain concepts. While they are not equivalent in all cases, it's a good initial mindset for easing into the concepts:
The primary way to wrap up code that belong together in OCaml is by defining a
struct
, which is a module value. For example, we can define a mathematical module like so:The meaning of this is
Math
is a module value, which is assigned the structure (simply the construction of a module is called this) specified betweenstruct ... end
. Note, that there is no construction runtime, all module operations are purely compile-time, despite the name. The following C# code would be roughly equivalent:Invoking the code from OCaml means prefixing the functions with the module name:
OCaml modules can contain function definitions, value definitions, type definitions and other module definitions. This structure looks like an assignment in the form of
name = value
, so it might not be surprising, that ML modules have a type too. The type of the module above can be written in the following way:Note, that these module values can not be mixed with regular OCaml values - like the ones you introduce with a
let
binding. The same applies to module types and regular types. They are both purely compile-time entities, incompatible with regular values.You can specify the module type after the name, separated with the usual colon:
To make this a bit nicer to read, we can factor out the module type and bind it to a name:
When you don't specify the type of a module, it will automatically be inferred for the binding. When you explicitly specify it, OCaml will check, if the expected signatures are present in the module definition. You can have more module elements than the one expected by the signature, but the ones defined in the signature are required, causing a compile-error otherwise. For example:
This explicit type specification will have significance later.
Visibility, encapsulation
Explicit signature types can help hiding certain members of the module. For example, in this example (using the previously defined
MATH
module type):The member
add2
is not accessible throughMath
, since it's not a constraint listed inMATH
. This is very similar to when we look at an instance through an interface type, and only the interface types are accessible statically:Note, that there's no runtime polymorphism or dynamic dispatch in case of OCaml, this is all compile-time.
Sometimes, we want to check the correctness of a module signature, but we don't want to hide the details. In that case, the check can be done in a separate module alias:
Associated types and modules
As mentioned before, it's perfectly legal to define types and other modules inside inside a module:
And these can also be expressed in the module type too:
This is especially useful, when a module needs to do an abstraction over some type or types, but the exact type should be hidden. Let's say, we wanted to build a set. Let's define our module like so, using a list for simplicity:
Since we are working with immutable data structures and free functions, the underlying value - list in this case - is taken as an argument and the new underlying value - the modified list - is returned on a modifying operation. The equivalent C# code could be something like (using arrays instead of linked lists for simplicity):
You might wonder, how the OCaml code became generic, because there is no sign of generics anywhere. This is just the fact that ML languages usually generalize types as much as they can, and these functions we wrote - unsurprisingly - happen to work on any element type. Looking at the implementation, it's pretty simple, but inefficient. It would be nice, if we wrote an abstraction for sets, so anyone could write their more efficient or tweaked set implementations. Let's try to do that:
There is a huge problem with this: we have constrained the implementation type to be a list of the elements. This is not great, we need a way to hide the fact, that some sets will use a list as an underlying type, but some won't. We could try to make the underlying type generic:
Let's see what the compiler has to say!
The compiler doesn't like that we tried to be way more constraining than what we have promised in the signature! This is similar to doing this in C# (which is of course also a compile-error):
TODO: Solution, associated types TODO: with constraint TODO: Parallel with Rust traits
Functors, higher-order modules
TODO
Personal thoughts
These are completely personal opinions on various topics. Let me know if you particularly agree or disagree with something. While this is just opinion, I'll try to provide some reasoning alongside.
.NET vs the rest of the languages
The way .NET languages work are very different from the rest of the languages. This is likely because .NET went with namespaces instead of modules. While this makes sense as this means that multiple assemblies can be developed independently but still expose a consistent API at the same time under the same root namespace, I believe there are also some significant drawbacks. Namely, there is zero structure or hierarchy enforcement in .NET. Any random part of the code can contribute to any other. This makes enforcing a good code structure and hierarchy much harder, as technically any code structure could map to any part of the namespace hierarchy. Since it's already a soft convention to follow the folder hierarchy with namespaces, I wonder if this is something that should be enforced to some amount.
Visibility vs explicit import/export
Visibility is a simple, but relatively inflexible mechanism. It usually means that only a fixed, defined subset of the codebase can access a given element. I believe that some very important visibilities are missing that would be useful, and some are useless when not doing strictly object-oriented programming. For example, looking at the visibilities of C# (not really blaming the language design, as it's an OOP language by nature):
public
: Essentially public API, useful in generalprivate
: Visible within defining class, only really useful for OOP or in static class modulesprotected
: Visible for subclasses, only really useful for OOPinternal
: Visible for anything within the same assembly, useful in generalprotected internal
: Visible within the same assembly or in subclasses, only really useful in OOPprivate protected
: Visible in subclasses defined within the same assembly, only really useful in OOPI believe that some more fundamental visibilities are missing, like "accessible within this file", or "in this namespace", ... But more importantly, a finite set of access modifiers will always be a relatively rigid system with minor annoyances. Wrapping up a few packages under a common API and hiding the original lower level APIs is harder (maybe even impossible).
An export system is finer grained in the sense that you can propagate the exported parts up the module hierarchy, leaving behind parts that should be private to that subtree. I'd say exports are not more cumbersome at the definition site, they are not worse than simply specifying visibility. Where exports become more cumbersome is when symbols are propagated up from sub-modules to the parent modules, exposing the API towards the "higher level". This is what happens in JS/TS barrel files too, and they are essentially describing the module API in a more explicit manner.
On more interesting module features, ML modules
TODO