c3lang / c3c

Compiler for the C3 language
https://c3-lang.org
GNU Lesser General Public License v3.0
2.68k stars 160 forks source link

How explicit imports could be eliminated #6

Closed PavelVozenilek closed 2 years ago

PavelVozenilek commented 5 years ago

I am the one who questioned the need for explicit imports on discord. It was for the first time I tried that web, and hopefully the last one.

What I had in my mind: implicit imports (in C #includes) are consequence of the past hardware limitations. They clutter the code bringing very little of useful information.

Similarly, separate compilation of modules (in C jargon translation units) is due to ancient memory limits. It adds complexity (intermediate object files and all the mess accompanying them) and increases overall compilation time.

A language from 2019 should not be limited by constraints from the 1970's. All source files for a project should compile as one indivisible unit. This cuts down on stale versions chaos and allows global optimizations. (C/C++ does allow this, with so called "unity build". Unfortunately it is frowned upon by people living in the glorious past when machines had 8 kB RAM.)

This arrangement would also make it easier to eliminate explicit imports. Once compiler parses all source files, it knows all the symbols and can resolve them inside modules. Only if there's ambiguity, the symbol would need module name.


How about the ability to rename imported module to a better name?

This feature (namespace renaming) is available in C++. I have yet to see a code which does it. I perceive this as misfeature. If module name is really so atrocious, why not rename the file a be done with it?

How about the ability to import only some parts of a module?

IMO not needed. I cannot imagine legitimate use for it. Another impractical me too misfeature.

What else would make implicit imports more handy?

Simplicity. One file = one module. One module = one file. No exceptions. File name (w/o path and extension) is module name. That name is not repeated inside. Compiler already knows it, you know it too. If you use invalid characters, it is your fault. Rename the file.

This would allow easy file renaming, easy move of source files within the project hierarchy, easy merging or splitting files. Try anything of that in a large C/C++ project.

lerno commented 5 years ago
  1. The reason why people would want to rename imported modules: the normal case is that modules have longer names to avoid namespace collisions. Having longer names for import and shorter aliases for manual disambiguation seems reasonable.

  2. Importing parts of a module is about importing parts of the module into the local namespace to avoid unnecessary name pollution. I am not sure that I need this for C3.

  3. I'm trying to wrap my head around what effects that would have. Presumably everything would need to be namespaced then, aggressively so, eg.

stdio::printf("Foo\n");
stdio::File file = stdio::load_file("test.txt");

While I don't mind function namespacing, namespaced structs is something of a pain.

The idea with multi file modules is that within a module you have the automatic lookup. The module is big, it might be your whole application. So you end up with the following visibility categories:

"local" – which is just local to the file. No worries that this intrudes on the namespace of other files, even in the same module. default – it's visible within the module, no need to do anything. "public" - this is what's actually visible outside of the module, which should be a tiny subset.

So think module as a very big set of code rather than a single tiny element.

You can think of the "module" declaration as meaning "import everything else that has this module declaration, and keep those imports in the local namespace"

This is different from say Java, where there are deep hierachies within a single project. That's not the intention of the module in C3. The module is your app or a complete, freestanding library.

I'd be interested in hearing your further opinions on this.

PavelVozenilek commented 5 years ago

Having longer names for import and shorter aliases for manual disambiguation seems reasonable.

For this I have handy solution. I'll describe it later, in a topic of its own.

(I have several dozen of topics on my mind. I am trying not to overload you, and also it takes some time to formulate them.)

I'm trying to wrap my head around what effects that would have. Presumably everything would need to be namespaced then, aggressively

Now, this depends whether function overloading is allowed. If not, as in C, the problem is small. If overloading is supported (I much support this, C's way is artifact of past hardware limitations), then there are ways to reduce conflicts. I may make this another topic.

I actually have in my mind yet another desirable feature, which would, as a side-effect, reduce likelihood of name clashes. I'll describe it someday.

multi file modules

I do not like this idea, because it breaks simplicity of one file == one module rule. Big files are IMO not evil, lot of small files is much more chaotic situation. But if this is really, really needed, then it could be solved by using certain convention. Names like:

my-module.1.c3
my-module.2.c3
my-module.3.c3

would form one module named my-module. The compiler should enforce these "numerized" names have no holes, etc.

Having special access for just one file from a multi-file module feels as over-thinking it. For example, it would make harder moving part of code from one file into another, or merging files, or splitting a large file.

lerno commented 5 years ago

It seems what you term "module" is different from what C3 means by "module". For C3 the module is a single compilation unit. For an app it's usually the entire app except for external libraries.

Inside of the module there is a large shared namespace, and the module is also compiled as a single unit. Thus a module in C3 lingo might consist of 50 kloc or more, spread over a number of files.

Function overloading is not allowed, but generic functions are available. https://c3lang.github.io/c3docs/generics/#generic-functions

These can actually be extended, creating a overloading-through-macros.

lerno commented 5 years ago

Let me add that I'm positive to the ambition to remove imports, it's just that it has to work for all the corner cases as well ;)

PavelVozenilek commented 5 years ago

The terminology as I use it:

Terminology should be "standardized" in docs.

lerno commented 5 years ago

I've started to investigate how this could perhaps work, but a major problem is generic modules. Did you read about them yet?

PavelVozenilek commented 5 years ago

Not yet. I'll.

PavelVozenilek commented 5 years ago

I'd looked at generic modules. They seem to be more usable alternative for generic functions (I do not like the _Generic trick). I do have a technique to use them (which is more universal but easy to use). I'll try to create topic about it tomorrow.

PavelVozenilek commented 5 years ago

An aside anecdote: the term "generic module" reminded me, how I once thought about something similarly named: "parametrised modules". The idea was, that I have file with some general purpose structure (associated array), and at the place of its use I would specify patterns of its expected use (how many items typically expects, more often read or updated, speed or size).

Then inside the module, something would happen and the result would be data structure best fit for the given purpose (tree, hash table or sorted array). It would always have the same type, so no complications in the code.

Eventually I abandoned this idea. Such chameleon would be complicated beyond all beliefs and I couldn't find how I would reasonably express my requirements.

lerno commented 5 years ago

Oh, that "parametrised module" is exactly what generic modules are.

lerno commented 5 years ago
  1. The namespace hierarchy is module::submodule. This is directly reflected in the file hierarchy, so file path is src/module_name/submodule_name.c3 (If you want to split a submodule into several files, make the submodule a folder instead so src/module_name/submodule_name/file1.c3, src/module_name/submodule_name/file2.c3 etc.
  2. Typically a user defined entity is used without any qualifiers, e.g. Foo rather than module_foo::bar::Foo Current module shadows other modules for the unqualified name. So within module_foo, Foo used without qualifiers is always module_foo’s Foo If other modules provide a type and the name collides, then it must be used with sufficient qualifiers to make it unambiguous
  3. Submodules share user defined type namespace, and shadowing between them is not allowed. functions may shadow other functions within the same module, and is preferred to be used with qualifiers: bar::open_foo(...) although calls are allowed without qualifiers if not ambiguous
  4. Libraries are all parsed just like the source files. Exclusion is done at the build file leve. The compiler will look through libraries to get the matching symbols without any need for explicit imports.

Code example:

// File /foo/bar.c3
struct Bar
{ ... }

struct Foo
{ ... }

struct FooBar
{ ... }

func void testSomething() { ... }

func void testSomethingElse() { ... }
// File /foo/baz.c3
func void testModule() 
{
    bar::testSomething(); // qualifier is recommended
    Bar bar;              // qualifier not required
    bar.x = 1;
}
// File /fooa/xyz.c3
struct FooBar
{ ... }
// File /foob/test.c3
struct Foo
{ ... }

func void testModule() 
{
    Bar bar;            // Not ambiguous, fine to use.
    Foo foo;            // Automatically uses own module Foo
    bar::FooBar fooBar; // Ambiguous, so name is needed.    
}

The weakness of this is if you already have a module foo that contains a Foo struct, which is used unqualified throughout the project. If later a module bar is included that also have Foo which makes the symbol ambiguous and requires the qualified name all over. Although the renaming might be simple ("replace Foo with xyz::Foo everywhere"), it is annoying and might not be desired (for example, a math library is included and the math library has a Vector2D struct, which you also have in the game framework module, and in 99.99% of the cases you want to use the game framework's Vector2D struct).

For that reason I see the need for two more functions:

  1. Renaming, e.g. rename bar::baz::Foo = Foo2;
  2. Aliasing any symbol, not just types, e.g. alias bar::baz::open as bar_open;

The aliasing only adds an alias, whereas renaming actually deletes the previous name. These should probably both be available on build level as well as source level. Renaming is typically module wide.

lerno commented 4 years ago

The module system that's currently adopted does not quite use the above, but something similar.

lerno commented 2 years ago

It's worth revisiting this.

lerno commented 2 years ago

@PavelVozenilek I remember we talked about a system where everything was implicitly imported?