JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.49k stars 5.46k forks source link

Some sort of policy to regarding unexported functions in Base? #12647

Closed IainNZ closed 6 years ago

IainNZ commented 9 years ago

There have been a couple of times during 0.4 development where unexported functions have been removed or changed. Every time its come up we seem to go in circles a bit about what to do about it. Given the resources available, I feel we should just keep it simple:

No support or deprecation warnings, just change/remove them. Only exported functions get deprecation warnings or mitigation regarding changing arguments.

But we need consensus around this to make it stick.

timholy commented 9 years ago

Even though I've paid the price for this myself on occasion, I can't imagine any other sensible policy. :+1:

johnmyleswhite commented 9 years ago

+1

rsrock commented 9 years ago

+1

ScottPJones commented 9 years ago

Unexported things are definitely supported in Julia, and are deprecated all the time. Things like to_index and compilecache are not exported, for example. compilecache had been called compile. If that had made it into 0.4, instead of being changed before release, it would have needed to be deprecated as well. The real issue is that Julia doesn't have any way currently of declaring that a function/method/type should not be accessible outside of it's module, which would have prevented these cases of internal functions being used outside base, and there are other valid reasons for not exporting a name (to avoid name pollution).

timholy commented 9 years ago

In general user-code should not call to_index or compilecache. __precompile__ is an exported function.

kmsquire commented 9 years ago

A very minor example of a function that would be affected by this policy is Base.permute!!(a, p). It exists so that Base.permute!(a, p) can permute a in place, but leave the permutation p untouched. But if a user knows that p is okay to clobber, she may wish to call the double-bang version, to avoid copying p.

It's unexported and undocumented, and therefore not widely used outside of Base (only DataFrames that I'm aware of, probably because I added it there). It's also the only example I'm aware of in base with a '!!', which is probably why there was resistance to export it in the first place (ref: #2201).

I don't think this is anything that should prevent the policy proposed here from being put in place (these things should probably just be handled on a case-by-case basis as needed), but I thought I would mention it.

aviks commented 9 years ago

I am agnostic about this being a policy in Base . However, I think we should be careful in claiming this to be an overall Julia design guideline. (Not that anybody is suggesting that at the moment, but people tend to follow practices from base, and so I thought I should articulate this, if only for the record)

Given Julia's module features, it is sometimes good design, and at other times necessary, to not export names from some module. Every package can, and needs to, make this decision on its own.

ivarne commented 9 years ago

I think @aviks is right, and that there should be room for documenting (and deprecating) unexported names (especially in packages, but it also makes some kind of sense in Base).

Maybe it would make sense to tie deprecation/midigation policy to documentation status, instead of only exportation status?

IainNZ commented 9 years ago

Updated title to keep this focussed on Base only, I don't want to get into packages (they can do whatever the feel is appropriate).

mauro3 commented 9 years ago

One counterexample to this proposal in Base is the Profile module which only exports @profile. But quite a few methods belong to its public interface, for instance Profile.print.

I think a way to specify which unexported functions belong to the public interface would be good. Which is similar @ScottPJones's comment above about making methods inaccessible (although I wouldn't go that far).

toivoh commented 9 years ago

I agree.

tknopp commented 9 years ago

Why aren't these functions in the Profile module exported?

My mental model was that public==exported which seem to hold in 99% of the cases except where submodules are used for introducing namespaces

timholy commented 9 years ago

Profile does introduce a namespace. Notice the functions are called print, init, clear, and retrieve; odds are quite high that you don't want Profile.print instead of Base.print each time you say print. Given that we have modules, there's no reason to call them profile_print, profile_init, and profile_retrieve. Pkg behaves by similar rules.

alyst commented 9 years ago

Just a thought from a user perspective. I think in general it would be nice to let power users/package developers to take advantage from some "advanced" Base functions (e.g. permute!!()) and commit to tracking their compatibility between the releases. Such functions are not meant to be called from the REPL, so they could be safely left unexported. However, they still need to be a part of Julia documentation (i.e. have a @doc block) to be properly used. That could be their distinctive feature. If somebody wants to use something Julia developers have not documented, she/he should be really on her/his own.

aviks commented 9 years ago

Given that we have modules, there's no reason to call them profile_print, profile_init, and profile_retrieve. Pkg behaves by similar rules.

This is the kind of use I was referring to. I was thingking of JSON.write() but there clearly are lots of such uses in Base.

My mental model was that public==exported ...

I don't think that can or should be true in Julia.

But let me say, while I am debating the larger design problem, as far as the specifics of this issue goes, I am all for nuking code aggressively. Julia is still a 0.x language. If it looks like some code is un/under-used, delete it, with a good faith effort at notifying packages. At this stage in our lifecycle, that is perfectly acceptable, nay, necessary. Also (to prevent some counterarguments), I say this as someone who uses Julia in real life :).

tkelman commented 9 years ago

I don't like making this a policy, as phrased it seems to be encouraging introducing breaking changes more often without any migration options. Yes it's risky to depend on unexported functionality from base, we should write that down in some form and let it be known, and be more careful about only doing so when absolutely necessary in packages. And we need to be able to detect and highlight more clearly when it's happening, so we know about it in advance. Is this one of the things Lint.jl can or does check for? Have many established packages started using Lint.jl in their development process the way CI and coverage tools are being used?

We shouldn't be introducing frivolous breakage, even in unexported functions, without good reasons for doing so (there are plenty of good reasons as the language develops, but that's a difficult line to draw in policy terms). There are enough cases where using unexported functionality is being done and necessary, that "break things at will" as a written policy means things will be broken more often. Until we have the hardware resources to run PackageEvaluator reliably, 100% daily, and/or on pre-merged PR's, this makes working against -dev versions of Julia even less stable. We've done a really lousy job of keeping users away from -dev versions. Turns out people like new features, and releases are taking time. When users are the ones finding breakage in packages before the package developers do, and are just reporting issues instead of helping fix the problems with PR's, that's a sign that too many people are on -dev. And that we're not doing a good job of teaching enough people how to participate in fixing the bugs they find when using unstable versions of Julia.

As I see it, base has more developer resources than packages. Less effort here for more breakage in packages doesn't seem like a good tradeoff, unless we manage to change the onboarding process of who uses -dev versions of Julia and how breaking changes get fixed across the ecosystem.

StefanKarpinski commented 9 years ago

I think the issue here may be that we need separate mechanisms for expressing that something is public and expressing that it should be exported (in the current sense).

On Aug 16, 2015, at 7:48 PM, Tony Kelman notifications@github.com wrote:

I don't like making this a policy, as phrased it seems to be encouraging introducing breaking changes more often without any migration options. Yes it's risky to depend on unexported functionality from base, we should write that down in some form and let it be known, and be more careful about only doing so when absolutely necessary in packages. And we need to be able to detect and highlight more clearly when it's happening, so we know about it in advance. Is this one of the things Lint.jl can or does check for? Have many established packages started using Lint.jl in their development process the way CI and coverage tools are being used?

We shouldn't be introducing frivolous breakage, even in unexported functions, without good reasons for doing so (there are plenty of good reasons as the language develops, but that's a difficult line to draw in policy terms). There are enough cases where using unexported functionality is being done and necessary, that "break things at will" as a written policy means things will be broken more often. Until we have the hardware resources to run PackageEvaluator reliably, 100% daily, and/or on pre-merged PR's, this makes working against -dev versions of Julia even less stable. We've done a really lousy job of keeping users away from -dev versions. Turns out people like new features, and releases are taking time. When users are the ones finding breakage in packages before the package developers do, and are just reporting issues instead of helping fix the problems with PR's, that's a sign that too many people are on -dev. And that we're not doing a good job of teaching enough people ho w to participate in fixing the bugs they find when using unstable versions of Julia.

As I see it, base has more developer resources than packages. Less effort here for more breakage in packages doesn't seem like a good tradeoff, unless we manage to change the onboarding process of who uses -dev versions of Julia and how breaking changes get fixed across the ecosystem.

— Reply to this email directly or view it on GitHub.

timholy commented 9 years ago

Yep. Currently, the best indication is: is the function documented? If not, it's likely not intended as part of the public API. All the exceptions raised so far are in that camp.

ScottPJones commented 9 years ago

Even things that are not intended to be part of the public API should have internal documentation though, just so that somebody besides the original author (or the author years later) can figure out what is going on.

timholy commented 9 years ago

devdocs

ScottPJones commented 9 years ago

Having devdocs is very good, but is not a replacement for having internal documentation of functions. If you have to deal with code whose lifespan is measured in decades, it is rather important. I'd like to think that Julia will still be relevant, being updated and supported in 30 years from now.

hayd commented 9 years ago

Perhaps help could say that the method is not exported?

It's impossible to support every internal method change between dev versions (e.g. 0.4-dev) but perhaps it would be worthwhile comparing major versions. i.e. upon 0.4.0 what internal methods have changed or been removed since 0.3.

ScottPJones commented 9 years ago

I think the issue here may be that we need separate mechanisms for expressing that something is public and expressing that it should be exported (in the current sense).

This is really not essentially different from what I've been suggesting, except that it is opt-in rather than opt-out (of being a public, not exported function/method/type). Doing it opt-in, as @StefanKarpinski suggested, is probably better, since it seems that it is less common in Julia to have things public but not exported, than to have things not exported with the expectation that they are "private" in some sense.

@hayd It would be a good idea of having help display that, yes, esp. if there is a way of marking something as part of the public API, but not exported. No, you wouldn't want to try to compare internal method changes, the real problem right now is that you can't distinguish between what is really internal and what really is part of the public API.

tknopp commented 9 years ago

@StefanKarpinski, @JeffBezanson: Could you clarify if the intention of export was that exported functions form the public interface while unexported functions are private? We see that there are currently exceptions (submodules like Pkg and Profile) but it before putting to much weight on the exceptions it would be interested what the purpose of export was/is.

StefanKarpinski commented 9 years ago

export controls what is available via using – that's it at the moment. It's an open question whether we need a different notion of public/private than that. One idea is to separate the concept of "exported" from "public": exported things are made available by using whereas public names still need qualification but are visible; the implication would be that anything not public would no longer be accessible even with qualification. Of course, you would still be able to sneak around that restriction using eval, but it would make it much more obvious that you were, in fact, doing something sneaky. It would probably make sense to allow non-const public bindings to be written to externally as well. So that would mean you'd have this kind of thing:

module Foo
    xx = 0
    const ro = 1
    public rw = 2
    public rwi::Int = 3
    usable = 4
    export usable
end    

using Foo

xx                  # not defined
ro                  # not defined
rw                  # not defined
rwi                 # not defined
usable => 4

Foo.xx              # not allowed
Foo.ro => 1
Foo.rw => 2
Foo.rwi => 3
Foo.usable => 4

Foo.xx = -1         # not allowed
Foo.ro = -1         # not allowed
Foo.rw = -1
Foo.rwi = -1
Foo.usable = -1     # ???

Foo.xx = "zz"       # not allowed
Foo.ro = "zz"       # not allowed
Foo.rw = "zz"
Foo.rwi = "zz"      # not allowed
Foo.usable = "zz"   # ???

Having export and public be orthogonal may make sense. This requires some thought.

tknopp commented 9 years ago

I know that export is currently about scoping. In my opinion public and export are not orthogonal and it would complicate the system to much. We IMHO need a single mechanism to say what the (public) interface of a module is.

Can't the submodule use case be solved differently? I.e. the submodule exports the function and the mother module does not reexport the individual functions but only the scoped ones?

mauro3 commented 9 years ago

@tknopp: your last suggestion does not work in the case of Base.Profile.print as exporting print would clash with Base.print. Renaming it could work though.

ivarne commented 9 years ago

Exporting Profile.print() would work, but we'd have to change sysimg.jl to explicitly import only @profile, instead of calling importall. (or am I just totally wrong?)

hayd commented 9 years ago

See also https://github.com/JuliaLang/julia/issues/12064#issuecomment-119900260 which I think is similar (privacy of fields) and touches on this.

My thoughts are the same as on that one: what are you going to do when users want to access some private method, stop them doing it completely? If there's a workaround what's the benefit? and if there is a workaround why not just use underscore to denote private, like in python, rather than enforce it.

Does the fact that no-one has put a package together since that discussion suggest there's little appetite for this????

carnaval commented 9 years ago

Whatever the decision is, there has to be a workaround. If there is something I hate it's the computer not doing something because it "knows better". I know better, you silly pile of microwaved sand.

tkelman commented 9 years ago

you silly pile of microwaved sand

Heh. I agree, enforced privacy is annoying when debugging and doesn't seem like the right solution for Julia. What I think I'm looking for is a technical-debt-detector so people other than the immediate author of the risk-of-breaking uses of non-exported code (even if it's the same person, but revisiting code 6 months later kind of thing) have an easier way of identifying problems before they happen. Breaking things in base for good reasons helps the language move forward, but the fallout of leaving things in a broken state for weeks afterwards in packages is bad for users. Maybe Pkg.add and Pkg.clone in -dev builds of Julia should have big noisy warnings, "you are using an unstable development version of Julia, if you find bugs in packages please help fix them."

ScottPJones commented 9 years ago

Enforced privacy may be annoying when debugging, but it is a lifesaver when trying to keep your sanity over years (or decades) of maintaining a codebase. Solution: have the enforced privacy on by default, but be able to turn it off by a command-line switch, have have it off by default for debugging builds.

StefanKarpinski commented 9 years ago

Whatever the decision is, there has to be a workaround. If there is something I hate it's the computer not doing something because it "knows better". I know better, you silly pile of microwaved sand.

:heart: @carnaval

rsrock commented 9 years ago

Better trademark that quote, or it's going to end up on t-shirts and you'll miss out on a heap of cash, @carnaval!

Didn't Leah Hanson once say that one of the nice things about Julia is that it doesn't feel like you're fighting the compiler? I'm really afraid these proposals will end up with us cursing and beating the microwaved sand.

Here's what I don't like about this proposal (which seems to have veered off course, btw): 1) Not being able to use what's sitting in front of my face, because the pile of sand won't let me. 2) "public static final" hell. Throwing decorators around like that just makes code noisy and unreadable. Unfortunately there isn't a kill switch that will remove these from the code itself (I suppose I could "sed" everything if I really needed to) 3) My favorite of all, reading code written by scientists (my kin) where we try to work around all of these unnecessary restrictions. I've seen getters and setters defined in class A that will call other getters and setters in class B that then get or set something in class C. Encapsulation? We don't need no stinkin' encapsulation!

If you must have these "features", there's always Java for you.

But I'd like to try to pull this back on course. The topic here is whether the interface extends beyond the exported methods, and how to deal with things like deprecations. I had always thought that the only "public" methods and fields were the exported ones, but the Pkg and Profile examples show that public scope is wider than that. However, @timholy's point that things like Pkg and Profile methods are documented is a good one. So I'd propose that no deprecations are needed unless the item is 1) exported, or 2) documented.

When I use or modify something that's not in class 1 or 2, I always comment it as such. If it changes or goes away, my future self will understand why. That way, I'm never angry at the pile of sand in front of me, I just go and fix it.

(And if we must have something like privacy, _foo and __foo seem perfectly adequate. These should be kept to a minimum as a matter of style, to indicate "dangerous" code. Everything else public by default)

ScottPJones commented 9 years ago

At the very least, with a public keyword such as in @StefanKarpinski's example, you could: 1) make it a rule that registered packages "follow the rules" (and if they really need something to be made public, they can always submit a PR) 2) have a command line switch to turn on enforced privacy (if people prefer it off by default) 3) the very nice Lint.jl can check for violations. It would make the author's intent totally clear, and prevent nasty surprises like when I made some of the internal UTF functions more generic, and accidentally broke JSON.jl and all the packages that depend on JSON.jl.

(I do agree with @carnaval, BTW, whatever the decision, we'd need a workaround. The idea is to make developers lives easier, not annoying)

ScottPJones commented 9 years ago

@rsrock, see my comment https://github.com/JuliaLang/julia/issues/12647#issuecomment-131666031. Even totally internal functions need to be able to be documented. Not doing so just makes life hell for your future self and others who need to understand the code. To me, having to decorate things with _ is just a pain, I'd rather add 7 characters (i.e. public) in front of the very few un-exported things that I really do want as part of the public API, than make have to prefix all my functions, fields, etc. names with _, every time I use them.

hayd commented 9 years ago

Perhaps it would be good to count what things (in Base, or in external packages) could/should be internal, so that we have some idea of what we're talking about here.

(IMO that the ``-prefix is really not that bad, but I guess I think "public by default" makes more sense...)_

rsrock commented 9 years ago

Comments in the code itself would seem to cover that situation.

Also, if you're really making absolutely everything private by default, consider how difficult you're making it for folks who have to use your code. Sometimes that private stuff is actually useful. In general, it seems that folks slap the private label on stuff without considering if it's actually necessary. Is it really really critical that no one touches your uber_foo() method? Usually not.

@hayd, just to be clear, I'm advocating for "public by default", using _ only when necessary to mark "danger, private!"

ScottPJones commented 9 years ago

For example, I've seen a ton of code that accesses .data when working with strings. Problem is, that doesn't always work even now, and there will be major breakages if we try to put in better string handling. That is why just saying, "we don't care about encapsulation" gets you into serious trouble down the line, where you are prevented from making any improvements in the language, because too many people are accessing internals directly. "public by default" is the direct opposite of what most people seem to think is the case in Julia, many people have told me that if it is unexported, you don't have to worry about changing it or deprecating it. A public keyword, as @StefanKarpinski showed, would only be needed sparingly, for the few cases where unexported names really are part of the API.

tknopp commented 9 years ago

This thread is (again) hijacked about a discussion of enforcing privacy. Why can't we have a single keyword that say: "This is part of the public interface". I see that there are currently issues with submodules and scoping but can't these be solved? From a higher perspective its pretty unintuitive that submodules do not export their functions and this IMHO is some inconsistency that could/should be solved.

I am also linking #8005 where export/public were also discussed.

nalimilan commented 9 years ago

I like the way it works in R, and it looks like it would work fine in Julia: all functions in a package are private by default, and they can be accessed via pkg:::function() (ordinary functions can be called via function() when the package is attached, or pkg::function() if not).

I think it would be cool to adapt this to Julia, and make functions only available via Pkg..function when not explicitly exported or declared public. That syntax still allows using private functions without too much burden when debugging or if you're willing to take the risk, but still makes the unstability quite clear. Lint.jl would easily catch this kind of use, and it could be forbidden or discouraged for registered packages.

mauro3 commented 9 years ago

To keep on hijacking this issue: I agree, the .. would be ideal for this. (It would also work with #1974, if that comes.) Then, a keyword would be needed, probably public as suggested above. And maybe one more, say public_all, to make all methods, etc. public, which would then be equivalent to how it is now. Or maybe, similar to baremodule, we could have publicmodule.

I think this would also solve #12069.

ScottPJones commented 9 years ago

The only thing I would suggest differently from what @mauro3 just said, is to use something other than .., because that is already wanted for intervals. Maybe %%?

ScottPJones commented 9 years ago

Or .!?

rsrock commented 9 years ago

@tknopp: I think we already have a single keyword that says "this is part of the public interface". It's export. I agree with you further up that public and export are not orthogonal when it comes to defining the interface.

@ScottPJones: regarding "public by default", we have to be careful to define the two senses of "public". Because I can directly access fields and methods, I consider Julia to be a "public by default" much the same way that Python is. See https://docs.python.org/3/tutorial/classes.html?highlight=private#

In C++ terminology, normally class members (including the data members) are public (except see below Private Variables), and all member functions are virtual.

However, that's separate from defining the public interface (the real topic of this issue). I gather that you're beating yourself up over causing some breakage when you changed some exported functions. My advice: don't beat yourself up over this. If you have changed unexported and undocumented functions / fields, it's not your problem. (That said, we shouldn't go around making frivolous changes and churning the code either)

MithrandirMiles commented 9 years ago

I’m not beating myself up about anything, I just feel, from 29 years of being the principal architect (that was ANSI standardized, so I’m not to blame for the original syntax!) of a language that originally did not have any distinction between publicly accessible and not, until I managed to get that added to the language, that being able to keep things encapsulated is critical to good software engineering, and building reliable code. The bad case I had was changing en unexported / undocumented function, which broke JSON.jl, which broke a whole ton of packages. :cry: I’m not saying either that in Julia, it should not be possible to go around the system, in need, for debugging, etc. Since most people seem to feel that by default, most things that are not exported should be treated as private, the simplest case would be to have a public keyword to indicate something (method or field) is part of the public API (which would only needed for non-exported names, a rare case). Accessing them as pkg.!function() or typeinstance.!field would not be burdensome (and would be much less frequent than all the cases of having to add _ to denote “private”), and could easily be searched for, and checked by Lint.jl. Everything could remain publicly accessible by default, with just a command line switch for a strict mode, which would be useful for production code (which is what I am/will be using Julia for)

tknopp commented 9 years ago

@rsrock: Yes until this thread my mental model was also that export is what defines the public interface. So I am absolutely happy with this. Tim pointed me to the exception with submodules. And in my opinion it makes more sense to think about how to "fix" the submodule use case (i.e. these submodule have to go through some export at some point) then to introduce a new keyword which is not orthogonal.

rsrock commented 9 years ago

@StefanKarpinski

export controls what is available via using – that's it at the moment. It's an open question whether we need a different notion of public/private than that.

I'd argue that we don't need a different notion of public/private. To take your example

module Foo
    _xx = 0
    const ro = 1
    rw = 2
    rwi::Int = 3
    usable = 4
    export usable
end    

using Foo

xx                  # not defined
ro                  # not defined
rw                  # not defined
rwi                 # not defined
usable => 4

Foo.xx              # not allowed, but you can cheat with _xx
Foo.ro => 1
Foo.rw => 2
Foo.rwi => 3
Foo.usable => 4

Foo.xx = -1         # not allowed, but you can cheat with _xx
Foo.ro = -1         # not allowed b/c const
Foo.rw = -1
Foo.rwi = -1
Foo.usable = -1     # ???, should be fine

Foo.xx = "zz"       # not allowed, but you can cheat with _xx
Foo.ro = "zz"       # not allowed b/c const
Foo.rw = "zz"
Foo.rwi = "zz"      # not allowed b/c Int
Foo.usable = "zz"   # ???, should be fine

Advantages of doing things this way: 1) This is what we already do. Plenty of examples in Pkgs (~1500 in my .julia, with admittedly imperfect counting). 2) No code is required. (well, perhaps a bit. The assignments in blocks 3 and 4 don't work currently) 3) No. 2 means that it's easy to change in the future, or to add packages that define other privacy capabilities, if absolutely necessary. 4) If you want namespaces, use import. The public interface can still be defined through the exports, however.

I think all it needs is some documentation. I'd be happy to take a stab at this in a PR. Where should such a thing live, and what files should I edit? I know that the manual is in a state of flux at this precise moment.

rsrock commented 9 years ago

@tknopp

Tim pointed me to the exception with submodules. And in my opinion it makes more sense to think about how to "fix" the submodule use case (i.e. these submodule have to go through some export at some point) then to introduce a new keyword which is not orthogonal.

Agreed

jdlangs commented 9 years ago

Could export be expanded to include fully-qualified names? That way the public interface is anything exported but the author still has fine-grain control over what using brings into local scope.

module SomeMethods
pub1(x) = ...
pub2(x,y) = ...
internal1(x) = ...
internal2(x) = ...
export pub1, pub2, SomeMethods.internal, SomeMethods.internal2
end

or

export pub1, pub2
export SomeMethods: internal1, internal2