JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.06k stars 5.43k forks source link

Add `strict` mechanism for opting into stricter subsets of the language #54903

Open Keno opened 4 weeks ago

Keno commented 4 weeks ago

We've had a few discussions the past few weeks about a feature tentatively dubbed pragma strict after similar constructs in other languages. However, there wasn't really a cohesive writeup of the intent, so triage asked me to write one up to serve as the basis for discussion and fleshing out. I intend to edit this issue as the idea evolves.

Basic idea

The basic idea of the pragma strict feature is to have an opt-in mechanism of turning julia programs that are semantically valid, but undesirable for other reasons (e.g. using ambiguous syntax that should have arguably been disallowed, but we can't for backwards compatibility reasons) into errors. This would be an opt-in feature for developers who have personal, organizational or regulatory requirements for requiring stricter coding standards. An additional motivation is to provide an additional vehicle for low-frictition language evolution. For example, if a specific opt-in turns out to be popular across the majority of packages, a potential julia 2.0 that made the opt-in automatic while technically breaking, would be largely non-breaking in practice.

We are not imagining a single strict mode opt in here, but rather a finer grained set of options, plus versioned collections of options for particular use cases. See the last section for a an initial list of such options.

It is worth emphasizing again that this feature is only intended to disallow undesirable programs that are otherwise semantically valid. It is not intended to cause meaningful semantic differences in programs that are valid both in standard semantics and under the opt-in restrictions (i.e. turning on the restrictions may cause things to error, but if they don't the program should behave the same).

How does the opt-in work?

One of the primary questions in this proposal is how the user expresses the opt-in. There's a few separate semantic options, each with a number of potential syntax options.

  1. Per module opt-in like our existing Experimental.@compiler_options
  2. Per file opt-in (e.g. using a magic comment on the first line) - popular in some other languages
  3. Per project opt-in in Project.toml

After some discussion on triage, a Project.toml-level opt-in seems like the best option. The primary motivation here is to allow opt-ins that need to be done in the parser (e.g. whitespace requirements). We don't currently define the execution ordering of parsing and execution for packages, so a module-toplevel opt-in may be semantically too late (relatedly, it may be ambiguous what happens when the opt-in is placed in the middle of a disallowed parse). An additional concern is that ideally IDE tooling would be able to understand the active set of restrictions without having to look at the code.

Concrete Project.toml syntax options

One convenient option would be reusing Preferences.jl. One might imagine a julia-level preference like:

name = "MyPackage"

[preferences.julia]
strict = ["nomultiassign", "uniqueidentifiers"]

This doesn't fully mesh with the usual preferences semantics, since preferences are ordinarily uniqued per-UUID while they would be private for a particular package, but this might be ok. Alternatively, we could reserve the strict key in each individual package's preference table:

name = "MyPackage"

[preferences.MyPackage]
strict = ["nomultiassign", "nolocalshadow", "noglobalshadow"]

Alternatively, we could have a new top-level strict section:

name = "MyPackage"
[strict]
julia = ["nomultiassign", "nolocalshadow", "noglobalshadow"]

Initial idea list for opt-in options

In this section, I'm collecting a list of potential options that might be implemented. However, I am not at this point asking people to brainstorm all the possibilities that could be implemented. I'm also not asking for detailed discussion on what should or should not be included in a particular option. Rather, I wanted to have a place to list all the ideas that have already come up and a place to link any issues that could be addressed by this feature. Full design discussions for individual flags can be had on the PRs to implement them once the overall mechanism is in place.

Individual options

Disallows multiple assignments in the same expression without parantheses. I.e. disallows a = b, c = d, e, = f = (1, 2)

Disallows shadowing of local variables, e.g. in the following


function foo()

    for i = 1:10

        for i = 1:10 # Error shadowing local `i` 

        end

        all(1:10) do i # Error shadowing local `i`
            iszero(i)
        end
    end
end

Disallows shadowing of global variables, e.g. in the following:

function foo()
    missing = false # Error local `missing` shadows imported global `missing`
end

Stefan had proposed introducing a unique assignment operator, e.g. := for which there would then be a corresponding opt-in to enforce all assignments use it

If we implement some variant of export versioning, there could be an opt-in forbidding unversioned exports.

Collections

The idea of collections is that users in general don't want to individually decide which opt ins matter to them, but will likely be following a standard set by their organizations or prescribed by a style guide. To this end, there could be meta opt-ins like "basestyle", which would activate a standard collection of opt-ins. These collections should be versioned and activated based on the min-compat version of Julia. In this way, new opt-ins can be added to a collection, without automatically activating them on a julia version upgrade.

adienes commented 4 weeks ago

I would suggest that one of the individual options should be disallowing control flow in non-"statement" position, i.e. https://github.com/JuliaLang/julia/issues/50415

jariji commented 3 weeks ago

https://github.com/JuliaLang/julia/issues/51223 is my proposal for := reassignment.

jariji commented 3 weeks ago

I like (Stefan's?) idea that whitespace must match operator precedence so you can't write 2 * 3+1, you have to write 2*3+1 or 2*3 + 1 or 2 * 3 + 1.

StefanKarpinski commented 3 weeks ago

Another thing we had talked about was having a set of defaults based on a version, which you could add to or subtract from which might look like this:

strict = ["1.12 defaults", "no local shadow", "-explicit imports"]

Would there be strictures besides the ones provided by Julia itself? Not clear on why there a strict section and a "julia" key in your examples. Wouldn't a single strict entry with a list of values suffice?

nsajko commented 3 weeks ago

Some of the configurations should probably disallow accessing non-public names, including:

  1. Names of instance properties (ref propertynames). Wouldn't affect types defined in the same package.
  2. Names in a module (ref names). Wouldn't affect modules in the same package.

IMO this should be opt-out for all packages, but shouldn't affect the REPL.

jariji commented 3 weeks ago

What do you think about having these subsets be installable packages so users can contribute their own rules, rather than having an official set of rules?

davidanthoff commented 3 weeks ago

One question is whether this needs to be implemented in Julia itself, or whether this belongs more in a linter like tool. At some level it strikes me that if this is a thing, we most definitely would want to implement support for this in things like the language server. And then the question: what is gained by having two implementations?

davidanthoff commented 3 weeks ago

And another idea for potential use-cases: relative to more statically typed languages, it is really difficult to provide the kind of robust IDE experience from a language server that languages like TypeScript, Rust or C# have. But maybe there is a scenario where one could actually provide the same kind of robust IDE support if one was willing to avoid some of the more dynamic language features that Julia has. Obviously, that would be a terrible default, but I certainly have packages where I don't need many of the dynamic features of Julia and would much like to have an experience that is more statically typed. Not sure whether that is really feasible, but maybe worth exploring, and this strict type feature might be a good way to opt into a mode that gives one a statically typed IDE experience.

nsajko commented 3 weeks ago

@davidanthoff maybe I'm wrong, but I have a feeling you may have this backwards. My feeling is that the only way for the language server to become a clear win for users (currently it's quite annoying with the false positive warnings) is to plug into the Julia implementation quite directly, maybe similarly to Cthulhu.jl. So maybe the language server for Julia should be just a thin wrapper around Julia.

davidanthoff commented 3 weeks ago

@nsajko probably best to stick with Keno's suggestion to collect ideas here but not discuss or evaluate them in detail, that would presumably just distract from the topic of this issue. Having said that, if you have ideas and thoughts about the LS, please open an issue over at it's repo and we can discuss there.

LilithHafner commented 3 weeks ago

I agree that per-project is the best approach.

I agree that Project.toml is the place to put this opt-in and configuration.

Concrete Project.toml syntax feedback

I think a toplevel strict is the simplest approach to avoid confusion over subtle inconsistencies with Preferences.jl's preference resolution.

name = "MyPackage"
strict = ["noreassignment", "nomisleadingwhitespace"]

Additionally, this strictness is a property tied to the package about as closely as it's name and version—it's likely that a project declared without strict rules will fail to parse with them. The closest analog to this feature I know of is Rust editions. In Rust, that field is stored in the [package] table, which serves a analogous role as the toplevel table in our Project.toml files.

Reserving the strict key in each individual package's preferences is a bit breaking. It's also unclear what it means to set the strict preference of any package other than the one named by the Project.toml file. Syntax that enables this seems problematic:

name = "MyPackage"

[preferences.OtherPackage]
strict = ["nomultiassign", "nolocalshadow", "noglobalshadow"]

A toplevel [strict] section seems unnecessarily verbose compared to a toplevel strict key.

c42f commented 2 days ago

provide an additional vehicle for low-frictition language evolution

Yes! We need this for syntax evolution which is often technically breaking but not actually very breaking at all in practice. There's so many examples of this. Some being #36547 and #54915. In https://github.com/JuliaLang/julia/issues/36547#issuecomment-1449143117 I show that several bugs would be fixed by this syntax change. But the change itself is nevertheless, technically breaking and it's a really tough call to decide whether to do it.

Lilith has already mentioned Rust Editions. A core part of Rust editions are that they don't bifurcate the ecosystem because crates with using different editions can work together. We should definitely do that to avoid a python 2/3 style debacle.

For example, if a specific opt-in turns out to be popular across the majority of packages, a potential julia 2.0 that made the opt-in automatic while technically breaking, would be largely non-breaking in practice

For a lot of minor syntax improvements, I think they'd best be expressed as "use the latest syntax as of Julia version 1.x" rather than as opt-in flags. If we're trying to improve the syntax to change/remove ambiguous or confusing syntactic constructs, we want an incentive for the whole ecosystem to drop the old syntax. For example if we do the change @JeffBezanson mentioned here https://github.com/JuliaLang/julia/issues/54915#issuecomment-2235198794 in a Julia edition we do want packages to drop the old syntax as soon as possible.

"Use Julia 1.x syntax edition" is great from this point of view:

So I think "syntax evolution" should not be fine grained, where at all possible - it's a bit different from the other "strict mode" things which are mentioned above, where users might want fine-grained control over opting out of certain language constructs.