KCL: 'import' statements

KittyCAD / modeling-app

The KittyCAD modeling app.

https://kittycad.io/modeling-app/download

MIT License

429 stars 37 forks source link

KCL: 'import' statements #4080

Open adamchalmers opened 1 month ago

adamchalmers commented 1 month ago

Problem: Currently all KCL projects have to be a single file. There's no way to reuse code across files. Solution: KCL files should be able to import each other, so that one file can use functions or constants defined in another.

Principles

One file === one module, and the directory tree (on disk) === the module tree (in code). This isn't really relevant right now but is a key principle we should stick to going forwards. I'm basing this on chats with various Rust maintainers around Rust's module system being overcomplicated and how we'd all like to simplify it.

Hygiene: importing a module should not change the behaviour of your code. This means it shouldn't overwrite your code's constants/functions and it shouldn't change your model. The only changes should happen when you actually use the code you're importing.

Design

Syntax

We add a use statement, with an optional as suffix. For example:

use "lens.kcl" as lens

This executes lens.kcl and makes its top-level identifiers (e.g. fn cube()) available under the namespace lens. Your KCL file can now invoke lens.cube() like a normal KCL function.

If you leave off the as lens part, it'll use the filename as the module name. For example, these two are equivalent:

use "lens.kcl" as lens
// Same as above but shorter
use "lens.kcl"

To make this easier, ZMA will stop letting KCL filenames contain spaces. Most CAD apps don't allow spaces in some contexts anyway SolidWorks, Siemens Sinumerik). The mechanical engineers agree with this decision.

Usage

A use statement is only valid in the main/top-level scope of the file. In other words, you cannot use it inside a function or any other expression.

While importing a file, the KCL interpreter won't let it send any engine requests. This prevents side-effects when importing. Instead, it returns an error, saying "Imported files cannot execute API requests, consider wrapping this in a function instead, so that the caller can choose when it executes."

Initially we'll only support importing files from the same project (we'll remove this limitation eventually).

Implementation

When you import a file, it'll create a namespace in the importer. So, we'll add a new kind of KclValue variant, KclValue::Namespace(HashMap<String, KclValue>). The HashMap contains items in the namespace, mapping their name to their value.

When KCL interpreter executes a use statement, it'll:

Read the file lens.kcl from the ZMA project
Parse it into a kcl_lib::types::ast::Program
Execute it
Collect all top-level declarations in the ast::Program into key-value pairs in a KclValue::Namespace

Future extensions

Let users only import some symbols, e.g. use (eyepiece) from "lens.kcl" which lets you use the eyepiece symbol without prefixing it as lens.eyepiece
Marking bindings as private/public
Module trees (e.g. use "telescope/eyepiece.kcl")

jtran commented 1 month ago

As stated, there are a few implications of the above that might be unintuitive.

The point-and-click UI will not edit the code of something in another module. Currently, we can only open one file at a time, so if you're not viewing that file, we won't be modifying it, meaning much of the UI will be (figuratively) disabled when interacting with its output.

Transitive uses are available through a.b.c.

// shared.kcl
length = 5

// lens.kcl
use "shared.kcl"

// eyepiece.kcl
use "lens.kcl"
// This evaluates to 5.
lens.shared.length

use can happen inside an if conditional or function body, meaning that if a use appears in a file, it can get executed zero or more times.
Re-useing the same file multiple times loads the file multiple times. This is true whether the use statements are in the same file or in transitively used modules. If it has side-effects like adding geometry to the scene, it creates multiple copies of the same thing on top of each other.
Circular uses don't work because they cause an infinite use loop.

To address those latter points, I think we should make this minor adjustment:

To prevent infinite use loops, we'll detect circular uses and raise an error pointing to the use line, with a message that includes the name of each module in the cycle.

(Alternatives Considered: Both Python and JavaScript allow circular imports. This is tenable because there are two phases. First, modules and their exports are found, and then code executes. In Python, that's what if __name__ == '__main__': is for. But it's difficult to understand and debug when something goes wrong, like when Python code violates the phase separation. KCL doesn't currently have this phase separation or main. Rust distinguishes between modules and crates, and circular references are treated differently between them. I think that loading a module needs to have no side-effects before we can do anything fancier.)

If not immediately, shortly afterwards, we should make this change also:

Cache uses based on their real, absolute file path. It makes no sense to render multiple copies of geometry in the same place. And since (most) things are immutable, it doesn't make sense to have multiple copies of KCL values. This way, if multiple things use the same file as different names, the modules refer to the same KCL object, and the work to load the file isn't duplicated.

benjamaan476 commented 1 month ago

This feels like a good step towards a library of standard parts that we can supply! I'm thinking various screw sizes (M3, M4, etc.).

To address @jtran concerns my suggestion would be something akin to C/C++ header guards (#pragma once or the C #ifndef) this prevents multiple imports of the same module and prevents infinite use as it just wouldn't include the recursed use the second time thus killing the loop. I'm sure there is a way to manage this without having to manually guard all of the files like in C/C++, maybe something in the file meta data or something.

I don't know if we have scope within kcl but useing a file in a conditional is odd but not totally wrong. You would just only have access to it within the block that you've used it from.

For the problem of side effects you could enforce that anything that is going to be a module can't execute anything? Only declare variables and functions. Then it becomes more of a design to make a module be a separate entity from an "executable" kcl. First line of the file has to include module or something.

I like lua's import where you specify a directory and it pulls in all of the files within the directory

lf94 commented 1 month ago

Yes it seems we should restrict to importing a module once. Multiple import looks to bring more problems than it's worth. +1 for the whole proposal!

adamchalmers commented 1 month ago

Sorry, I didn't spell it out anywhere, but I assumed that use statements would only be legal at the top level of a file. You shouldn't be able to use them in if statements or function bodies. @jtran what do you think?

jtran commented 1 month ago

It sounds nice, to only allow them at the top level. But if useing a module can have the side-effect of drawing geometry, don't I need to be able to conditionally run it?

I guess in the short-term, if you're just using your own files, it's fine. We can always add it to conditionals later if we find we need it.

(Side-effects ruin everything 😢)

jessfraz commented 1 month ago

If there is a project.toml in the same directory as the file being imported it should be executed with those units, if there is not it should be “unitless” meaning it uses whatever the current project is that’s my only request , this looks great otherwise

adamchalmers commented 1 month ago

@jessfraz plan is to only import files within the same project for now (we can remove this limitation later) wdyt

jtran commented 1 month ago

@benjamaan476, I proposed that the limitation is that a used module cannot interact with the engine. If it tries to, it's an error. This way, it can still have functions and constants that do math computations, but it can't draw. There's also no need for extra syntax.

jessfraz commented 1 month ago

I think that’s fine but I’m just saying it’s stupid that kcl-samples works with the CLI just fine because we traverse directories for the project.toml but it’s stupid. It doesn’t work with the app and it’s stupid that no one can just load it as its own project in the app but it works fine with the CLI so I’m just saying it would be easy to implement on the CLI side a.k.a. in the pure Rust code so maybe we just do it but up to you.

another thing to keep in mind when implementing this but out of scope for the first pass is, we are going to also want to do use thing.json as data and so if there’s a way to implement it, Such that future us doesn’t have pain that would be great

jessfraz commented 1 month ago

I’d like to move away from the whole project dynamic, but there’s a separate issue for that. Just things should work no matter where you open it no matter how many nested directories it has.

jessfraz commented 1 month ago

Sorry, I mostly mentioned this just so that we don’t implement this in a way such that we are shoehorned in to the whole project thing. It’s fine if it only works for projects now, but I just don’t want to implement it such that removing this whole concept fucks us later Right now the CLI a.k.a. the pure right side works really nicely wherever the fuck you are

adamchalmers commented 1 month ago

@jessfraz what do you mean "kcl-samples works with the CLI", I've never opened kcl-samples, how does it work with the CLI? I assume you mean something beyond just "you can snapshot and export KCL files from kcl-samples"?

nrc commented 1 month ago

Problem: Currently all KCL projects have to be a single file. There's no way to reuse code across files.

I'd like to understand the motivation in more depth - why do users want to use multiple files? Is it because files are so large they are unwieldy? Because they want to share code between projects? Because they want to divide a program up for the sake of modularisation (and in that case what is the motivation? Documentation? Abstraction?)? Something else?

There seems to be some assumption around the other files being secondary in some sense - that they should not be modified by the UI and so forth. Could you give me some more context on where that comes from?

With the major caveat that I don't understand the context nor the existing system well enough to really have opinions, my opinions are:

we should not specify filenames in the use statements - the mapping from abstract modules to concrete filesystem artifacts should be somewhat abstracted and extra-linguistic. There's a few reasons for that (mostly future proofing): we might want to support non-filesystem programs, it makes moving files easier (especially if we support directories and paths in the future), no special syntax for the standard library or other 'special' libraries, makes working with modules from other projects easier, side-steps the spaces in filenames issue (or at least pushes it to the tooling layer), filesystems are super-crufty and there are loads of edge cases (e.g., the character sets used for the names, / vs \, relative paths and absolute paths, symlinks (and their behaviour with relative paths), different behaviours on different operating systems, etc) of course we have to deal with these anyway, but the user will have different expectations if they are using a more abstract name for a module vs a string literal. Also, if the filename looks like a string literal, users will have expectations that they can do string-like things with it.
module systems are a huge design space. A module system is going to become widely used so back-compat will be an issue very quickly. Module systems are surprisingly complex to design and implement (primarily because it makes name resolution complicated, but also because it affects tooling, error messages, etc.). Which is all to say, I think that if we can avoid having this in the 1.0 release, we should - it's worth designing in detail (thinking about future extensions) and having lots of time to iterate with feedback from users.

jtran commented 1 month ago

why do users want to use multiple files?

My understanding is that the main motivation is users wanting to make multiple parts that share helper functions and constants, and they don't want to have to copy and paste that helper code all over.

Currently, people generally make one KCL file per part. If you put multiple parts in one file, this is nice for viewing in the app. But when you export that part, it will be multiple volumes in a single glTF, and that may or may not work in the program where you're consuming it.

The second reason is if you have multiple projects that happen to do something similar, you also want to share code between them. As projects get bigger, this is becoming more common. A simple example is implementing a function for the Zoo logo.

The third reason, which I believe is what triggered all this happening now, is that there's some design space that is hard to get experience with and explore without multiple files. In the conversation about project.toml and units #3960, for example, people had a hard time saying what they even wanted when mixed units were involved, partly because there's currently no way to use multiple files.

It was concluded that we needed a simple way to share code sooner rather than later.

Early proposals suggested doing an expedient approach of a C-style include that added the included file's definitions to the file where it was included. There are a lot of problems with this, and I pushed for some semblance of a module so that the semantics of a KCL file do not change depending on how it's used or error out due to name collisions.

There seems to be some assumption around the other files being secondary in some sense

It's not that we don't want to be able to modify used files in the UI. I think we do. But it's just more work. I'm talking about when the user interacts with the 3D scene. Should the UI open that other file and jump to the line that generated it? That's not an issue now because the one KCL file must already be open. If the geometry clicked on was from a used file, the user is probably interacting with geometry generated from KCL code in a function. Does the user intend to edit the function itself or the specific invocation that resulted in the geometry that they clicked on? There are a lot of complications that come with the the UI editor above and beyond a regular programming language, and it's all unexplored territory. There isn't much in terms of prior art. Some of it, we haven't even solved for the single-file use case.

But separate from all the above, in some ways, there is a fundamental difference between library code that is depended on and code that is run. When you run a script, you run it for its side-effects, like drawing geometry. But if you use a module, do you want its side-effects? I would argue no. It's more about bringing the module's exports into scope. This is a design decision, though, because PHP includes circa 2000 technically work.

we should not specify filenames in the use statements

👍 I think I like this, but do you have another suggestion? I thought about this issue too, and I figured that the current proposal is compatible with the future addition of something like:

use std::shapes

Because it's a bare identifier, not a quoted string, we could distinguish it. But I generally agree that despite the "one file === one module" principle, it should still be possible for a module to exist without any file, like in the web app where there is no filesystem.

As for file names in other encodings, I think it's fine to just say, sorry, we don't support that.

I think the general principle of making unintended things be errors that we maybe relax later is a good approach. The use "string_here.kcl" is a good example. We could restrict it at the parser level that it's a string literal, not an arbitrary KCL value, and I think we should do this if we go with the current proposal.

nrc commented 1 month ago

Some more thoughts on this since it is becoming higher priority. I've been thinking about design choices which give us maximal choice for future growth along with some other tweaks.

I've change my mind on using strings since I can't think of a better way to do it without adding boilerplate to project.toml (bad) or constraining the design space a lot in the future. I would say though, lets not think of the string literal as a path, rather as an identifier of a code location and filenames are currently the only location identifier we currently support. I think for the first iteration we should not support arbitrary paths (thus avoiding a bunch of naming issues) only filenames for files in the same directory (obviously we'd improve this later). And yeah, strictly just string literals, not string-typed expressions.
I prefer import to use - both are widely used elsewhere. I believe import suggests more of a feeling of bringing in code from elsewhere, whereas use could mean something local, etc. It is also symmetric with export.
I think we should only support importing individual names, e.g., import foo, bar from "other.kcl" is supported but import "other.kcl" is not. This can be extended in the future, but for now means we don't need to scope names - names are still just identifiers, not some kind of path. It also means that the filename is not converted into an identifier so we can skip any issues like spaces in filenames.
Only explicitly exported items can be imported. Following the above example export foo = ... must exist in other.kcl. This is for backwards compatibility (we could import more in the future and there won't be any errors, well not many anyway, there could be name clashes. If we import everything then introduce export, then everything breaks for users).
Only functions can be exported. This means we don't need to deal with top-level code and side-effects in files but no errors if the module file has them. And we don't need to deal with re-exports (implicit or explicit) and so paths of names via imports are not a thing.
Imports can only be at the top level (i.e., not within functions for now at least). I'm not sure if order should matter? Should the following work:
```
x = foo()
import foo from "bar.kcl"
```
Do functions need to be declared before use? I guess we should follow that logic.
Cycles of imports cause an error, e.g., if foo.kcl imports something from bar.kcl. bar.kcl can't import anything from foo.kcl (even if the imports are logically disjoint). We could consider relaxing this in the future, but it's a nice simple place to start and forwards compatible.
Multiple imports are allowed in a DAG but not in cycles. So foo.kcl could import bar.kcl and baz.kcl, and bar.kcl and baz.kcl could both import qux.kcl, or foo.kcl could also import qux.kcl. Since we importing items from a module and not a whole module, and since we're only importing function names, not executing code then this is always OK (I think) and there is no difference between having 1 or 2 or 3 copies of the items in qux.kcl.
There's only one namespace, e.g., the following is an error due to reusing the name:
```
import foo from "bar.kcl"
foo = ...
```
No renaming (as) at first, although I think this will be fairly easy to add, just going for minimalism.
I think if we only import functions then there are fewer issues with codemods/ selection? We can just make these functions totally opaque to start with.
Just ignore units for now - we need to think about the whole units issue in depth. Given we only have one unit per project at the moment that should be fine for the first iteration.

jtran commented 1 month ago

Unfortunately, a lot of the rationale for this proposal didn't make it to this GitHub issue.

I think for the first iteration we should not support arbitrary paths (thus avoiding a bunch of naming issues) only filenames for files in the same directory (obviously we'd improve this later). And yeah, strictly just string literals, not string-typed expressions.

:thumbsup:

I prefer import to use

We already have import(), and it's something else. I suppose that if we really like import for this better, and we want to make the change, now would be the time. For me, personally, it implies there's a binary export or not, instead of levels of visibility (e.g. private, public to the project, public to the world, etc.). But we have no specific plans for this, so I don't have a strong feeling if you think import is the way to go.

I think we should only support importing individual names, e.g., import foo, bar from "other.kcl" is supported but import "other.kcl" is not.

This came up also. Basically, I think that very soon, people would get tired of listing all the items. Without editor tooling, it's busywork to have to update your imported items every time you add something to other.kcl. I'd anticipate users requesting import * from "other.kcl". But then we'd have to explain to people why that's bad and that you should really do import * as other from "other.kcl". So why not just start with that?

I suppose this comes down to whether we value being able to statically determine whether a member access is defined. I definitely flip-flop back and forth on how static I think KCL should be. It's true that having the identifier listed explicitly makes static analysis simpler. But without its type, which presumably you'd need to analyze the other file anyway to get, is it really that much better? In the proposal, they'd be using other.foo, so we'd know it's something imported from other.kcl.

I also find it's easier to read code that's prefixed with the module. join(a, b) vs. path.join(a, b). open(foo) vs. file.open(foo). When using the module name, it's much clearer. I'm thinking about how many terms like "vector" are overloaded. So I think it'd be nice to at least be available as an option.

Only explicitly exported items can be imported.

I think that someone said that we could make the breaking change of requiring export later, and everyone jumped on-board. But I, personally, agree that things should be private by default. Otherwise, things that were never intended to be part of the public API leak and get depended on, which causes problems.

I like this, but I think others will reject it. One thing to consider is that KCL projects are likely going to be small for a while, and public-by-default is really only a problem when you start sharing packages and having software engineering problems. OTOH, the skeptical part of me is saying, "will we ever really make the breaking change later?" It would have to be prioritized against everything else. Why not just do it from the start?

Only functions can be exported.

This would help reduce the needed difference between a top-level script and a library and generally make things easier for us implementers.

But this is not how people write KCL today. It's common for people to define a bunch of numeric constants at the top of their file. I suppose we could tell them to just wrap it in a function. So this:

export shapeWidth = 10
export shapeHeight = 20

becomes

export fn shapeWidth = () => { return 10 }
export fn shapeHeight = () => { return 20 }

I suppose that can be a bit more concise if we implement the alternative fn syntax we discussed.

export fn shapeWidth() { return 10 }
export fn shapeHeight() { return 20 }

But now, on the calling side, people need to use shapeWidth() instead of shapeWidth. It's a bit less convenient.

Another approach is to use the object syntax.

export fn config() {
  return {
    shapeWidth: 10,
    shapeHeight: 20,
  }
}

This could be good when you have a lot of variables.

I'm not sure what to say. I want to like this, but it's a little unsatisfying, and I'm afraid others won't go for it either.

Could we have some kind of value restriction? Like a constant is possible only if it can be statically determined to be a constant? But this rules out export x = 5 * inch(), and that seems like one of the use cases we'd want to allow. I guess we could have special handling of built-in functions, but... Not sure.

Imports can only be at the top level (i.e., not within functions for now at least).

Yes, the change to this was agreed upon.

Should the following work:
x = foo()
import foo from "bar.kcl"

Functions currently need to be declared before use. I think it would be nice to relax this at some point, but we're not there yet. So I think the above should behave the same and error with foo undefined.

Cycles of imports cause an error

Yes, we agreed this would cause an error.

As a user, I absolutely hate how Go disallows circular imports. Rust's approach is so much more convenient, in my opinion. But I'm not going to argue over this on the first version. Starting out strict is generally good.

Multiple imports are allowed in a DAG but not in cycles.

That's the plan. Yes, if we required exporting only functions, it would simplify things. But I think we'd still want to do memoization to not have to reparse and reload the functions every time. But at that point, no user could ever tell the difference, which is good. It would be purely an internal optimization.

There's only one namespace

Yes

No renaming (as) at first ... going for minimalism

If we went with importing each function/item individually, not having renaming would lead to name collisions where you couldn't import some combinations of things.

I think if we only import functions then there are fewer issues with codemods/ selection?

I'm not sure I follow you here. We currently treat symbolic constants the same as a function call when doing code modifications in the sense that we treat it as fully constrained. Only a literal is treated as unconstrained.

Just ignore units for now - we need to think about the whole units issue in depth. Given we only have one unit per project at the moment that should be fine for the first iteration.

:thumbsup:

nrc commented 1 month ago

I think we mostly agree on the eventual goal, just perhaps not on the steps to get there. To be clear I'm trying to identify the absolute most minimal first iteration, not the 1.0 version - I don't know what that will look like, but I think we should make the smallest possible step first in order to have the maximum flexibility for related features and to get feedback as soon as possible (and also to maximize available time for other priorities.

We already have import(), and it's something else. I suppose that if we really like import for this better, and we want to make the change, now would be the time. For me, personally, it implies there's a binary export or not, instead of levels of visibility (e.g. private, public to the project, public to the world, etc.). But we have no specific plans for this, so I don't have a strong feeling if you think import is the way to go.

My feeling was that eventually we would use import expressions for the role of the import function (that function is extremely magic otherwise). In the meantime, it is easy enough technically (albeit a little confusing) to allow import to be used as both a keyword and an identifier (we don't want to assign to import in any case). This is very much meant to be a temporary measure though!

I'm not to married to the name though. We could use use and public or pub too, I have a preference for import/export due to the symmetry and because it moves us towards a simpler privacy system, but it's not a strong preference.

This came up also. Basically, I think that very soon, people would get tired of listing all the items. Without editor tooling, it's busywork to have to update your imported items every time you add something to other.kcl. I'd anticipate users requesting import from "other.kcl". But then we'd have to explain to people why that's bad and that you should really do import as other from "other.kcl". So why not just start with that?

I think this is just about starting with the simplest thing and seeing what people complain about, rather than guessing what they will complain about. We can add glob imports before 1.0 easily enough, but let's start as small as possible. I will say that glob imports bring a lot of problems - they're pretty much universally regretted in Rust. I think it is better to encourage individual imports or paths as names, but I'd like to see the problems real users hit before committing to a path forward. Also, we should absolutely implement editor support for imports - it's pretty easy and really useful. I'd rather prioritise that for 1.0 rather than more sophisticated import statements.

In the proposal, they'd be using other.foo, so we'd know it's something imported from other.kcl.

I also find it's easier to read code that's prefixed with the module. join(a, b) vs. path.join(a, b). open(foo) vs. file.open(foo). When using the module name, it's much clearer. I'm thinking about how many terms like "vector" are overloaded. So I think it'd be nice to at least be available as an option.

I strongly believe we should postpone implementing paths as names. Not that we shouldn't do it, just that we should think it through and iterate on the design before implementing. It's a difficult topic with subtleties. Starting from the syntax - using . implies that modules are somewhat like objects, which has implications or potential for confusing users. Then there is the overlap with support for hierarchical modules. What happens when we want to treat a module like an object for other reasons (e.g., for assemblies), and so forth. Definitely something we should do, but I really think we shouldn't do it in the first iteration of modules. (And I think it would be bad to land a prototype version - we'd move onto other priorities, people would use it, and we'd get locked-in to a potentially sub-optimal design).

But this is not how people write KCL today. It's common for people to define a bunch of numeric constants at the top of their file. I suppose we could tell them to just wrap it in a function

I agree we want to be able to import contants in the long run. But for a first iteration, I think avoiding the side effects issue makes it worthwhile to forbid it. As you point out, consts can always be made into functions (like we do in std with e and pi), so there is a trivial workaround, and I believe it is much more likely that people will want to reuse functions than libraries. Once we discuss side effects in more depth, we can come back to importing constants, probably before 1.0 but not in the first iteration of modules (IMO). (For clarity, exported functions should absolutely be able to use local consts).

Another approach is to use the object syntax.

Later, later :-) Again there's lots of overlap with other reasons to treat modules like objects, so lets design that properly before implementing for the sake of this use case.

Functions currently need to be declared before use. I think it would be nice to relax this at some point, but we're not there yet. So I think the above should behave the same and error with foo undefined.

👍 sounds good

If we went with importing each function/item individually, not having renaming would lead to name collisions where you couldn't import some combinations of things.

I'm kind of keen to see the kinds of name clashes that happen irl, it's possible that might inform our design some how? On the other hand, implementing renaming is pretty easy and I'm 99% sure we want it, so if you want to implement in the first iteration, go for it! Or it could be in the first PR after that to make the first one quicker to land.

I'm not sure I follow you here. We currently treat symbolic constants the same as a function call when doing code modifications in the sense that we treat it as fully constrained. Only a literal is treated as unconstrained.

Hmm, so maybe it won't help too much. I was mostly thinking just importing functions is easier than dealing with importing top-level geometry in terms of expectations. But I don't understand exactly what codemods can touch (I was assuming that code on the right-hand side of an assignment is more mutable some how than code in a function, but that might well be incorrect). Anyway, not too important for now :-)

jtran commented 1 month ago

Okay, you've convinced me.

Thank you, everyone, for all the input.

nrc commented 1 month ago

For future work, I started a design doc at https://github.com/KittyCAD/kcl-experiments/pull/12

jtran commented 3 weeks ago

Just wanted to update this issue that we decided to go with import instead of use. The original rationale for using use was that it conflicted with the already-existing import() function. But people mostly preferred import over use, and we realized that the two things (the import statement and import function) are more related than they might at first seem. It's a little more complicated to implement (e.g. tokenizing and parsing), but we did it. So barring some last minute major changes, we're going ahead with import instead of use.