JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.91k stars 5.49k forks source link

Disallow access to internals (unless overridden) #52314

Open LilithHafner opened 1 year ago

LilithHafner commented 1 year ago

Originally posted by @StefanKarpinski in https://github.com/JuliaLang/julia/issues/49973#issuecomment-1828231790

Not breaking existing code limits what we can do. However, there's a decent argument to be made that since what's breaking is not a public API, blocking access to package internals is not actually technically breaking according to semver. Of course, we still have to be cautious to prevent massive ecosystem breakage. It would not go well to just flip the "no private access" switch for the whole ecosystem at once. Here's a possible transition strategy:

  1. Allow packages to opt into preventing access to their private internals. This should probably be a flag in the project file, but should maybe also be in the registry. There are three cases for a project that depends on a package that blocks private access:
    1. Project doesn't access internals so there's no problem;
    2. Project accesses internals but can easily use a public API instead—minor fix required;
    3. Project accesses internals and can't easily use a public API instead—can't be easily fixed.
  2. To handle the last case, we also allow projects to override one of their dependencies blocking private access. A project opting into this acts as an explicit indicator that non-breaking upgrades can break.
  3. For a while make the flag for whether private access is allowed or not mandatory—you can choose yes or no but you need to have some value in the project file.
  4. Later we change to preventing private access by default. Projects can drop the flag if the value is to block private access since that's the new default and only packages that need to allow private access need to keep the flag in their project files.

I'm not sure about the implementation side. If we can control module internals access per depender that would be ideal. That suggests that private/public needs to be metadata associated with the binding itself, which is kind of gnarly. Maybe we can do something with auto-wrapping each package in a public-only wrapper that rebinds only the public bindings of the internal package. That's the simplest to implement, but feels kind of icky.

Having public/private annotations on fields in structs would also be good, but I'm not quite ready to tackle that.

LilithHafner commented 1 year ago

I agree that preventing access to private internals must be opt in at first for the package whose internals are being accessed because this is not a viable option for packages that have public unexported functionality and have not yet adopted the public keyword.

I think that we should disallow (in the general registry) packages that both declare dependence on internals and declare compatibility with an infinite set of minor versions (an infinite set of patch versions is okay IMO)

This would pair well with normalizing retroactively changing compat entries on a package when new versions of their dependants are released.

That suggests that private/public needs to be metadata associated with the binding itself, which is kind of gnarly.

It already is :)

https://github.com/JuliaLang/julia/blob/5b2fcb68800875e570d7bb8c78ed00d360b6cfd5/src/julia.h#L589-L602

ParadaCarleton commented 1 year ago

Having public/private annotations on fields in structs would also be good, but I'm not quite ready to tackle that.

I think we might not need a new language feature for this--you can overload getproperty to error on accessing a field. (Maybe a keyword that does this automatically would be enough?)

LilithHafner commented 1 year ago

For internal fields, I would want getproperty to work in my package code (and in the code of any package that depends on my internals) but not in ordinary code that is using MyPackage. a.b lowers to Base.getproperty(a, :b) (not __MODULE__.getproperty(a, :b) so IIUC there is no good way to make getproperty error only when other modules use it without another language feature.

Tokazama commented 1 year ago

This seems like the type of thing where it would be really nice to have the this/self keyword.

LilithHafner commented 1 year ago

How would that help?

Tokazama commented 1 year ago

IIUC, we're just trying to inject the call context somewhere between lowering owner.name and getglobal(owner, name) so that we can do more than just ispublic(owner, name) to test what throws an error? If there were a global context variable at the top of the module that changed how GlobalRef(owner, name) was resolved, then we could add a unique path for certain modules to go straight to getglobal instead of testing ispublic.

BioTurboNick commented 1 year ago

It would be nice if this feature was something you could turn on for a code block/on the REPL, and not just be for an entire package. Force the developer to document which specific internals they are accessing, either as a parallel to public/export in listing the symbols used (or e.g. something like import private Base:), or as a macro encasing the usage.

Tokazama commented 1 year ago

It would be nice if this feature was something you could turn on for a code block/on the REPL, and not just be for an entire package. Force the developer to document which specific internals they are accessing, either as a parallel to public/export in listing the symbols used (or e.g. something like import private Base:), or as a macro encasing the usage.

This makes more sense to me than fully supporting getproperty access to explicitly marked private symbols in a module.

PallHaraldsson commented 1 year ago

It would not go well to just flip the "no private access" switch for the whole ecosystem at once.

A. I suppose we can disallow by default. If we want to do that, then should we have a moratorium on old packages?

Maybe that's the wrong word, but I'm thinking all old (registered) packages are allowed the old rules. Packages registered from now on conform to the new default rules. Or since they've been in development for a while maybe allow registering under the old rules for a while, 1 or 2 months. From publishing this policy.

Then nobody needs to change their Project file or wherever this is marked. The marking will be added to the registry for old packages for you. Possibly you would also be able to add it explicitly (or not) for packages registered from now on.

There's a question, what to do about upgrading old packages? They probably need the old rules, forever..., at least until next major version upgrade only?

B. One other possibility is why are people accessing internals? Because the API wasn't thought out? Only allow it for 0.x? Presumably in 1.x, or at least after next major upgrade the API was thought out.

LilithHafner commented 1 year ago

@StefanKarpinski, does "Without override" mean there is no override option or that there is no access unless the user overrides it?

StefanKarpinski commented 1 year ago

I just wanted to make it clear that it's not a "there's no way to access internals" proposal, but rather that we want accessing internals to require a project-level declaration of the need to do so. I've edited the title again to hopefully make that clearer.

LilithHafner commented 10 months ago

I think it would be useful to discuss this on triage and try to lay out a roadmap.

baggepinnen commented 10 months ago

Preventing field access by modifying getproperty may not be a generally applicable solution. Explicit field access by calling the function getfield is quite common in practice for multiple reasons

LilithHafner commented 9 months ago

Ian suggested a three phase approach

  1. [x] Implement the public keyword so folks can declare public/private API
  2. [ ] Have tooling support this (e.g. docstring tooltips display a different color for private symbols, opt-in Aqua tests for private access, treat tab completion differently for public/non-public (color, presence), don't shadow-complete private names, etc.)
  3. [ ] Make a language-level enforced system (either through true enforcement, or with sort of name-spacing/sintax to bypass it)

Maybe we do 3 (this issue) eventually, maybe not. Jeff is not optimistic about doing 3, I am, but I don't know the internals as well.

Let's table this (3) until 2 is done well.