Open Seelengrab opened 1 year ago
I think the motivation here is solid! Well said :-)
I don't so much like the feel like the proposed api
keyword directly in front of syntactic constructs. Basically for similar reasons that I'm against the same thing for export
, as expressed here https://github.com/JuliaLang/julia/issues/8005#issuecomment-660077430
Also I don't understand why api
would make sense as a per-method thing rather than a per-function thing. It's generic functions which are named/called by the user's code, not individual methods. So the entire generic function should have the same API status.
Did you consider the alternative of having the API declared separately, in a similar way to how the export
list currently is? Something more with the flavor of the proposal in https://github.com/JuliaLang/julia/issues/30204?
For example, we could have scoped export
:
scoped export A, B
The idea is to export A and B as public API, but keep their names "scoped" within the namespace of the exporting module.
A nice thing about this is that it could be implemented as a contextual keyword: the scoped
would only be a keyword when followed by export
in the same way that mutable
is not a keyword unless followed by struct
. Thus not breaking any existing code. It would also be easy to support in Compat.jl as @scoped export A, B
(which would just return export A, B
on older Julia versions, and would retain the usual special case parsing rules for export lists).
Regardless of my quibbles, I think this is an important issue and it's great that you've put the effort into writing up your thoughts. Thank you!
Did you consider the alternative of having the API declared separately, in a similar way to how the export list currently is?
I explicitly haven't mentioned other proposals because I want to see how people react to this taken at face value - not to mention that I think most other ideas in this space are a bit underspecified in what exactly the desired semantics are. I think that's supported by the amount of discussion around what the various PRs & issues mean & want, which is what I attempted to give here directly. Most of the proposal follows from the requirements imposed by "what is a breaking change" when looking at functions, methods, variables etc from the POV of a user upgrading a version of a package.
My reasoning for placing api
directly in front, rather than as an explicit list, is that it gives a small incentive/reminder to be careful when working on that code. The whole point is to be more explicit & thoughtful with what could be breaking changes, and I think having the API marker at the place of definition helps with that (it also makes it obvious that there's an undocumented API thing, which makes it extremely easy for newcomers to come in with a docstring PR, since they then "only" have to figure out what a function is doing to be able to document it, instead of having to figure out that there's an undocumented thing in the first place).
Another reason is that api
is a relatively weak marker, giving only the most necessary semantics required from the type system (in case of methods/dispatch/structs/abstract types). For any more details what api
means in the context of a given package, a docstring giving the exact surface is required - otherwise, every single bit of observable behavior could be taken as supported API, meaning you suddenly couldn't change the time complexity of a function anymore, for example.
And a third reason - if the whole function is always considered API, what about new methods added from outside of the package defining the function? Are they also automatically considered API, and who supports them? That's not something you can do with export
- you can't export new names from a foreign module (without eval
, but that breaks everything anyway). I think it's better to be a bit more conservative here - while the symbol is important in terms of what is directly accessible at all, often not the entire method surface is considered API, and while there could be a an argument made for subtly enforcing forwards to actual internal functions, I think that's quite an unnecessary indirection, requiring possible duplication of docs to devdocs, duplication of (future) additions in permitted dispatch, taking even more care to double check that you don't accidentally violate an invariant guaranteed by the parent caller etc. This, to me, feels like too much additional boilerplate, when we already have dispatch to distinguish these cases from each other - so why not expose that in API as well?
Finally, I don't think it's difficult per se to add something like
api foo, bar, baz
as a syntax for an API list, marking all of these symbols as API. The difficulty lies in the semantics - what the semantics of that api
marker are depends strongly on what foo
, bar
and baz
are, and so using such a list requires re-inspection into what foo
etc are, at a time when they (potentially) aren't even defined yet (and requires users to jump around in the source code while writing docs - something I'd like to avoid for dev-UX reasons). So I guess that's a fourth reason why I prefer local api
- unlike namespacing of symbols, which is a global property due to the symbol being unique, api
really has local differences in meaning & requirements. Not to mention that Jeff already expressed dislike for having "more than one way" of writing the same thing when talking about export
- maybe he feels differently about api
, but we'll have to wait for him to chime in to know that :)
Apologies, this got longer than expected - thanks for asking about it!
A nice thing about this is that it could be implemented as a contextual keyword:
This proposal can also be done like that, no? I've given an explicit list of places where writing api
would be legal, after all :) All of them are currently syntax errors, so adding this as a contextual keyword shouldn't break any existing code:
julia> api function foo end
ERROR: syntax: extra token "function" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api module Foo end
ERROR: syntax: extra token "module" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api const bar = 1
ERROR: syntax: extra token "const" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api abstract type MyAbstract end
ERROR: syntax: extra token "abstract" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api struct Foo end
ERROR: syntax: extra token "struct" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api foo::Int = 1
ERROR: syntax: extra token "foo" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
Thank you for taking the time to read through the proposal! I really appreciate your thoughts, especially because of your work on JuliaSyntax. If the scheme parser weren't so hard to jump into and hack on, this might have already been a draft PR instead of a proposal :)
I'm a bit skeptical about the per-method thing. At least it needs some good thinking about what would be breaking. Coming from above example
api function foo(arg1, arg2)
arg1 + arg2
end
A user can now happily do foo(1, 2.0)
. But the package in a new version now adds
function foo(arg1::Int, arg2::Float64)
arg1 * arg2
end
The same call as before is now accessing an internal method.
The same call as before is now accessing an internal method.
Yes, that is a good example for an accidental breaking change, of the sort described in the proposal that really must not happen. I realize now that I've only written about the case of removing foo(arg1, arg2)
, but of course this applies to adding new dispatches as well. I can add something like this to the list of "MUST"s, if you think that's correct:
api
before the addition must also be marked api
after the addition, such that code relying on the old dispatch doesn't have to access a non-api
method.I think there is room for having this testable as part of a testsuite - for example, one could take the declared signature of all api
methods of foo
in the last commit, and compare that against the dispatches of those signatures in all methods of foo
in the new commit. If any signature would dispatch to a non-api
marked method in the new commit, you have a testfailure due to an accidental breaking change. This does have the downside of requiring the testsuite to have awareness of being a testsuite and being able to check against a known-good version, but IMO that's something we should do anyway - we currently just do it (badly) in our heads when writing the code.
Another way this could be done without checking past versions is have something of the sort of
@test_api foo(::Number, ::Number)
to check whether the given signature (and its subtypes) dispatches to an api
method or not. I feel like there's also room for improvement there, but I haven't thought about that part too deeply yet (compared to the other parts of the proposal anyway).
I don't have much insignt into the merits of marking api per-method vs per-function, but I fear that per-method would be too complex and lead many devs to just don't bother with api
because of cognitive overhead. On the other hand, the rule for per-function seems straightforward enough: "an api
function must not break use cases allowed by the attached docstring". And for those motivated, it seems the code can always be re-organized to make "non api methods" unavailable by renaming them with e.g. a leading underscore.
Well conversely, the api function foo end
functionality would already give that exact behavior :shrug: I think a per-method annotation could be a boon, as described above, but I'm willing to drop that if others feel that it's too much mental overhead (though I'll gripe about this, since it'll inevitably lead to more accidental breakages down the road :stuck_out_tongue: the complexity of API surface doesn't vanish by marking the whole function as api
after all, it just hides it away until it becomes an issue).
In either case, I think it's good that this discussion is happening - keep the feedback coming!
Thanks for a well-written proposal! I 100% agree that there is an issue with a lack of consensus around what makes an API of a package, and that it would be nice to have a rigid way to specify this.
However, first, in the bikeshed department, I don't like the keyword api
. Unlike all other keywords, it's a lowercase form of an abbreviation, and I'd much rather see a keyword that isn't pronounced letter-by-letter. Or maybe this would be the motivation to start pronouncing it "ay-pai". I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API
:
"""@API
my_function(args...)
This is a nicely documented function that does what you expect.
"""
function my_function(args...) end
Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes. It is often very convenient to define a function like my_public_function(::Any, ::Any)
, which then, in principle, has an API surface covering anything the user throws at it that doesn't error, but in practice, and as perhaps written in its docstring, it requires some trait-like behaviour. I can definitely foresee situations where my_public_function
would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed. While it is, in a sense, problematic that the user code breaks in this case, I think a strength of Julia comes from actually being a bit fuzzy here. However, if I understand the semantics you proposed correctly, this would be considered an accidental breaking change that should "never happen", which I guess I then disagree with.
Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.
Finally, like with export
, I do not think it is useful to think of the API at a method level. I think there has been a fairly strong push in Julia to make sure that a given function has just one meaning, which means that methods are simply implementation details on how to implement that functionality for a given set of types. I wouldn't want to propose a convention where only specific methods are considered part of the public API. Something feels off to me to think about my_function
being public if you call it with types X
and Y
, but internal if you call it with Z
. This does clash a bit with declaring the API in the docstring, but that might not be important.
Thank you for your time and discussing the proposal!
However, first, in the bikeshed department, I don't like the keyword api.
The bikeshed is noted, but barring an alternative.. :shrug: I'm only partial to api
myself because it's a ridicolously well-known concept that fits the intended meaning precisely, as well as being quick to type and unambiguous with other keywords (tab completion can then work its magic wonders). I do agree that it being an abbreviation is kind of bad though. Suggestions are welcome! :)
I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API:
I originally considered proposing an API-string macro, used like so:
api"""
my_function(args...)
This is a nicely documented function that does what you expect.
"""
function my_function(args...) end
and implemented like
struct APIMarker
doc::String
end
macro api_str(s::String)
APIMarker(s)
end
which would give any documentation rendering the ability to distinguish API from non-API. The issue with that or with placing the marker in a docstring is that you now need to parse & process the docstring to be able to say "this is API" for doing code analysis. It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs
allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.
Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes.
That whole paragraph is a very good point, and I have thought about that! There's also a subtlety that I forgot to put into my original proposal, so let me take this opportunity to reply and add that subtlety here as well, by lieu of an example.
Consider a function with a single method like
function foo(arg1, arg2)
arg1 + arg2
end
that we say is API of some package. It's signature is Tuple{typeof(foo), Any, Any}
, and taken at face value, we might think that we can call foo
with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1
and arg2
to also have a method on +
, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}
. Neither exporting a whole function name, nor marking a single method as api
can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea). This is quite a bit more powerful a requirement than even Haskell does with its typeclasses, I think, though I don't have a proof of that. While typing arguments as Any
is tempting and can often help with interoperability, it's really unfortunate that these kinds of implicit constraints in reality constrain Any
further, so it wasn't ever going to work with truly Any
thing anyway. I guess the "hot take" here is that any API function that claims to take Any
is a lie, and shouldn't be trusted (well, other than the identity
function - at least, I haven't seen any object that couldn't be returned from a function. Sounds like a fun challenge..).
So, barring being able to express those kinds of constraints, the next best thing is to say "well, I support this signature, provided you implement functions X, Y, Z with arguments W, V, S on your type T" in a docstring. The foo(::Any, ::Any)
example is just the most extreme case of this; the use case I had in mind when writing this was something along the lines of
api function foo(x::MyAbstract, y)
2x+3y
end
and being able to express "I only support foo
with a first argument that's obeying the constraints of MyAbstract
", effectively transferring the constraint of *(::Int, ::MyAbstract)
to the contract/api that MyAbstract
provides/that subtypes of MyAbstract
are required to provide. From my POV, an abstract type is already what we'd call an informal "interface" in other languages - they just don't compose at all, due to only being able to subtype a single abstract type, leading to the proliferation of Any
in signatures (which ends up hiding the true required surface).
I can definitely foresee situations where my_public_function would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed.
Well, I don't per se see an issue with that - not every implementation detail is part of the API of a function. While some implicit constraints on the arguments can be, in practice those may be considered an implementation detail, and if they are, it's fine if non-breaking change (according to the docstring/additional documentation of the method) the breaks some user code that relied on them. The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to Any
.
Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.
I'd implement that with this proposal by having an inner module Experimental
, like Base.Experimental
, which isn't marked as api
(perhaps individual functions/methods inside of that module are marked API
though, to make clear which part of the Experimental
module is considered part of the experimental API and which parts are just implementation details of the experimental functionality). The reasoning for this is that experimental features, by definition, can't be relied upon under semver, as they may be removed at any time, so they must not be marked as part of the API surface of the package. Mixing experimental API into the same namespace as stable API seems to me like a way to more accidental breakages, through people assuming that something being marked as soft-API will be stable at some point, so it's fine to rely on it (which is not the case - usage of experimental functionality needs to be guarded by version checks, just like any other use of internal functionality).
Something feels off to me to think about my_function being public if you call it with types X and Y, but internal if you call it with Z.
Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo
that forward their Any
typed arguments to _foo
or _foo_impl
where the actual disambiguation through dispatch (with MethodError
to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl
has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.
In part, the "API per method" section of the proposal stems from my (perhaps irrational) fear that people are going to slap api
on everything to make the warnings go away, and then being undisciplined with actually providing the guarantees they claim to give with that marker. I think this issue already exists today, but it's just masked by noone being able to consistently tell whether something is part of the API of a package or not - my gut says that a surprising amount of code runs on the "happy path" and always uses the latest version of things anyway, but the few times a big upgrade in a version from 1-2 year old code and subsequent breakage of that code came up on slack just got stuck in my mind. If people were able to consistently rely on some given methods being API and developers were able to actually accurately test that their API surface on a method-by-method basis doesn't change, there'd be less of a risk of such breakage occuring. But again, that may be an irrational fear of mine :shrug:
It's signature is Tuple{typeof(foo), Any, Any}, and taken at face value, we might think that we can call foo with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1 and arg2 to also have a method on +, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}. Neither exporting a whole function name, nor marking a single method as api can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea).
I appreciate your point, but I disagree with the analysis. I could define a function:
"""
multiply_by_two(x)
Return `x` multiplied by two.
""""
multiply_by_two(x) = x
This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states. An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.
It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.
As my point above, I don't see any compelling argument for why this shouldn't be allowed to change at runtime, though. As I see it, having API declarations in the code is a feature that is purely intended to help users (and potentially linters). Intentionally making it less flexible than the rest of Julia's documentation system seems to be in stark contrast to Julia's dynamic nature, and I think analysers will be able to overcome the docstring-parsing requirement.
The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to
Any
.
While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.
Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo that forward their Any typed arguments to _foo or _foo_impl where the actual disambiguation through dispatch (with MethodError to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.
I also think it would be really nice to have all methods exposed at the surface-level of the dispatch tree, but there are a lot of practical reasons for having function indirection. Often it's with an args...
capture (Any
-typed), etc., and it'd be very inconvenient to have to explicitly type those in every method.
This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states.
Right, and I'm not claiming that api
(or the compiler) could or should be a total, machine readable description of all the invariants a function requires to be called - that would indeed be impractical and against the dynamic nature of julia. What I am saying/proposing is that the ability for a developer to declare "multiply_by_two
is part of the API of my package" for a given version on a method-by-method basis is valuable; whether the actual implementation matches the desired behavior for a function is a different matter. Checking that is the job of a testsuite, not some declaration about what a developers sees as public API for their package (which is the only thing I'm concerned about here - giving developers the ability to say "this is what I support", in a consistent, Base supported manner. As such, it IMO needs to be flexible enough to accomodate granularity).
An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.
I disagree; that's exactly what a testsuite is. A docstring is the prose describing the desired behavior, and a testsuite is the code checking that the functional implementation matches the prose.
While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.
I agree with you here, this is not currently a desirable thing to do because there's only one possible direct abstract supertype. That's still an orthogonal concern to whether or not a single method is considered API or not though. Maybe that's an argument for holding off on method-level-API until such a time when we can express these kinds of constraints, of saying "I require the invariants these 2+ abstract types provide", better?
personally, I would not prefer the syntax be a modifier to existing definitions like api function ...
because then the declarations are still scattered throughout the code. the primary problem for me is discoverability, which is maximized when the interface(s) are more or less centralized, or in some very clear hierarchical structure
an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl
Mockup below with a new syntax interface
block operating kind of like a header file, with arbitrary "tags" for API. the only information given about the functions is the names and signatures. Hopefully this should provide enough information for good tooling ? anything more substantive should probably just go in a docstring. interface
could also support the where
keyword so that methods can be made api with as much specificity as desired. also multiple declarations could be made so the entire interface doesn't have to fit in a single format if that pattern is desired. Another benefit of declaring it this way is that things can be made api conditionally via package extensions
module Animals
export speak, Dogs
abstract type Animal end
speak(::Animal) = "Hello, World!"
speak(::Animal, s <: AbstractString) = "Hello, $s !"
module Dogs
using ..Animals
struct Dog <: Animals.Animal end
Animals.speak(::Dog) = "woof"
bark(d::Dog) = Animals.speak(d)
end
module Cats
using ..Animals
struct Cat <: Animals.Animal end
Animals.speak(::Cat) = "meow"
fight(::Cat, ::Cat) = String["hiss", "screech"]
getalong(::Cat, ::Cat) = nothing
end
end
interface Animals where T <: AbstractString
:exported => (
speak => Tuple{Animal} => String,
speak => Tuple{Animal, T} => String,
Dogs => bark => Tuple{Dog} => String
),
:stable => (
Animals => Cats => speak => Tuple{Cat} => String
)
end
interface Animals
:experimental => (
Animals => Cats => fight => Tuple{Cat, Cat} => Vector{String}
),
:deprecated => (
Animals => Cats => getalong => Tuple{Cat, Cat} => Nothing
)
end
also, possibly this discussion should be folded together with https://github.com/JuliaLang/julia/discussions/48819 ?
an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl
I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all - this is purely for allowing developers to declare their existing API, without having to migrate to new concepts in terms of what they want to develop.
Also, while I can see a path for something like interfaces to be a thing, I don't think the approach in Interfaces.jl is right, in particular because it's a big disruption to the ecosystem due to the amount of "legacy" code we have. I have a different idea for how to tackle that (mostly based on type theoretic consequences/additions to the current type system), which does integrate with this proposal nicely without special cases while keeping code churn as small as possible (not larger than the code churn produced by this proposal, in fact), but that's for a different discussion.
also, possibly this discussion should be folded together with https://github.com/JuliaLang/julia/discussions/48819 ?
While that discussion is the first time I wrote down parts of the ideas contained in this proposal, it's not a concrete discussion about a concrete proposal. Thus, I think this issue is more appropriate to talk about this proposal at hand, instead of the larger discussion whether we want something like access modifiers at all. So far, the only major gripe with this proposal seems to be the per-method-API and some minor bikeshedding on the name/syntax to expose this (which I think I have argued sufficiently now why it ought to be syntax, and not dynamic information).
I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all
The only similarity is superficial. Otherwise, the "interface" is just a tuple of pairs without introducing any new type concepts. It almost doesn't require syntax at all; if modules were types one could just dispatch on some function interface(m::MyModule)
IMO the less needed by package developers to convert the current code to an API-aware world is better.
Where there is an API function in a package, I would expect any non-API methods of it to be an exception rather than the rule. So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc
(exported) and _myfunc
and __myfunc
for next-level internals, for example.
The argument that a keyword will help people be mindful is undermined by having an all-method construct; at that point it's just as easy to define the API methods in a central location and rely on tooling to raise awareness of which functions are API. And a central location is a more natural extension of what we do with export
.
As far as the scenario where Package B adds a method to an API function (say, getindex
on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...)
is an API of Package B if and only if MyType
is API type of Package B. And non-API types shouldn't be returned by API methods.
I have not given this anywhere near the amount of thought you have, @Seelengrab, but I am also a bit skeptical about the per-method thing. I kind of like @c42f's scoped export A, B
, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:
scoped export f
f(args...) = ...
if they want that, too. (export
s don't all have to go at the top of a module.)
As a way of ensuring that detached scoped export
s don't lead to violations, we could add Aqua testing for your API rules.
Trying to draw inspiration from how other languages do this, I'm reminded of how method overrides are handled in languages with classes. If an abstract method is marked as public, any overrides must be public as well. Less familiar with the space of languages with traits/protocols, but the couple I do know of have similar behaviour when it comes to implementing them (visibility of implementer must be >= visibility of trait/protocol methods). Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.
I've segmented the responses according to who I've quoted, so it's easier to follow :)
@BioTurboNick
So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc (exported) and _myfunc and __myfunc for next-level internals, for example.
Explicitly marking things as non-API has the issue that it's easy to forget - I mentioned that somewhere, I think. You really don't want to end up in the situation of having to do a technically breaking change in a non-breaking release (by semver) just because you forgot to add the non-API marker. No marker -> private/internal should be the default, rather than the exception. Rust goes as far as not even allowing you to access things that haven't been explicitly made available (which we don't want, and also can't do before 2.0 (but more like 10.0) because it's massively breaking).
As far as the scenario where Package B adds a method to an API function (say, getindex on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...) is an API of Package B if and only if MyType is API type of Package B. And non-API types shouldn't be returned by API methods.
That takes away agency from the developer of Package B, in that it doesn't allow the developer to have a public type, but without having getindex
be part of the API of that type. In a sense, requiring this kind of proliferation means that every time you extend the API-function/method foo
of some other package with an API-type of your package means that you are automatically now responsible for how the package uses foo
internally, and that it doesn't break. This feels a bit hard to do, when you don't have control over that. Selecting API on a per-method basis circumvents that issue, because it allows the developer of package B to say "no, I want to use this internally, but I don't really want to support this publicly".
To be fair, the example is a bit abstract, but I think the only reason something like this hasn't been a problem so far is because we really don't know what the API surface of any given thing is, so we somewhat expect things to break. It's also hard to opt into someone elses code, since we currently only have single abstract subtyping and the most succesful attempt at having such a wide interface package, Tables.jl, doesn't use abstract types for signaling dispatches at all (there are trait functions though). I think that's a missing feature in julia, but that's for another day and another proposal :)
@timholy
I kind of like @c42f's scoped export A, B, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:
I'm strongly opposed to calling this a scoped export
, because it does nothing of the sort. It is NOT exporting a symbol, and I think conflating the meaning of "this is OK to access, I support this" with export
is what brought us into this mess in the first place. It's a subtlety that I think is best to highlight by giving it a different name, so that people think of "This is my API" when using api
(or whatever we end up calling it) and "this is what I want you to have directly available after using
" when using export
.
As a way of ensuring that detached scoped exports don't lead to violations, we could add Aqua testing for your API rules.
I'm not sure Aqua rules are enough for this - it's not a linting thing after all. I think something like @test_api
is good to have in Base, since we do have the issue of fuzziness of API in Base too (and it's much more prevalent than in some parts of the ecosystem :grimacing: )
@ToucheSir
Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.
This is no coincidence - the rules laid out both by the Rust guidelines as well as the ones I've described in the proposal are purely a consequence of the type system and what it means to change something from version A to B. It's mostly a type theoretic argument that leads to the interpretation that it ought to be possible to mark individual methods as API. I feel though that to make this point more clear, I'll have to write some more about what I (and I think the wider programming language design community at large) considers a "type". So little time though... Taking that lense, as well as thinking about what SemVer means, leads to a very natural derivation of these rules.
On the flipside, it's of course possible to take a coarser approach and mark whole functions as API (that's also described in the proposal after all, that's api function foo end
) and if that's the direction we want to go in, great! As I said elsewhere (and too often), anything to mark "this symbol is intended API, even if not export
ed" is better than the nothing we have now.
If dropping per-method-API gets more support for this syntax proposal, good - let's do it. We can always add it back later if we feel that we do want that granularity after all, since it's always allowed to widen your API ;)
I'm strongly opposed to calling this a
scoped export
I guess there are two aspects: (1) what you name it (I'm happy to drop that aspect of the proposal), and (2) where you put it. If we drop the per-method idea, then there is no reason you couldn't put these annotations near all your export
s. Personally I like that design, it makes it clearer it applies to the function rather than the method.
To be clear, my point was that Rust appears to support many of the same rules about methods and API compat despite not allowing for private implementations of public methods. That's not to say allowing method-level granularity when specifying API surface area is a bad thing, but it suggests to me that these two pieces are somewhat orthogonal.
I agree that consensus on what "API" means is good and should probably be documented. Thanks for thinking about this and making a concrete proposal. One of the advantages of a concrete proposal is that I can point to concrete and hopefully-easy-to-fix problems.
in the example above [MyStruct is API and AbstractFoo is not], the full chain
MyStruct{T,S} <: AbstractFoo <: Any
is considered API under semver
I think this is too strict. A better requirement is that any subtyping relation between two API types should be preserved. For example, _InternalAbstractType
can be removed in MyStruct <: _InternalAbstractType <: Any
.
This is okay because folks who are dispatching based on _InternalAbstractType
are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0
, that method accepts ::MyStruct
, so it must continue to accept ::MyStruct
even after MyStruct <: _InternalAbstractType
no longer holds.
For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API ... The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.
This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.
As for how to mark things as API or not API, I concur with some others in this thread that marking API similarly to export is a good idea.
enters namespace with using |
does not enter namespace with using |
|
---|---|---|
part of API | export my_function, my_constant, MyType |
foo_bar my_function, my_constant, MyType |
not part of API | don't do this | default |
*foo_bar
could be scoped export
, api
, export scoped=true
etc.
@LilithHafner along those lines, I enjoyed the proposal from a few years ago here https://github.com/JuliaLang/julia/issues/39235#issuecomment-958539052
where in this case foo_bar
might be a semicolon separator
This is okay because folks who are dispatching based on _InternalAbstractType are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0, that method accepts ::MyStruct, so it must continue to accept ::MyStruct even after MyStruct <: _InternalAbstractType no longer holds.
That's a good point, and an even better one for the need for @test_api
! What your version effectively means is that abstract types give some guarantees (existence of fields, dispatches of functions, invariants..) and subtyping an abstract type means saying "I give those same guarantees" (as well as dispatching like them, because you give those same guarantees). Viewed under that light, having a non-API supertype just means that you don't give the guarantees of that supertype! That's quite elegant - the only way it could be even more elegant is if we'd be able to subtype more than one abstract type, which would effectively mean saying "I give those guarantees too" (you can, of course, get in hot water if you claim to e.g. both be an even number, as well as not being an even number, should such abstract types representing these concepts exist).
I like the idea of only keeping the guarantees you've actually promised to provide.
This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.
That's a predicament then - I originally wanted to allow widening of the type to one that gives fewer guarantees, but that obviously doesn't work because then reads may fail in dispatch (you can't take away guarantees you've given in a previous version). Taken together, these two imply that types of API globals must not change, at least without releasing a breaking version. Not related to the syntax proposal per se, but quite unexpected nonetheless!
Since I'm not sure it's quite clear, my motivation for having api
markers on a per-method basis is that it makes it abundently clear what is API; only marking the name API makes it (in my eyes) much more likely to forget to add a test for the API surface - if it's marked on a per-method basis, it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz)
and you're safe from accidental API regressions. That's just not the case when doing a name-wide API declaration.
it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz) and you're safe from accidental API regressions.
I agree that having specific signatures be part of the official API makes sense, but I think that might be orthogonal from the Methods themselves. @test_api foo(::Number, ::Number)
is something that can be implemented (sort of) independent of whether there is an actual method with that signature.
I say "sort of" because @test_api
evidently depends on type-inference, and that presumably works only up to runtime dispatch boundaries. Unless one is then going to check all potentially-applicable methods (for the callee of that runtime dispatch), in which case is once again out of the land of specific method signatures.
So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.
@test_api foo(::Number, ::Number) is something that can be implemented (sort of) independent of whether there is an actual method with that signature.
How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number)
has to exist; the semantic meaning of @test_api
is that no MethodError
happens when calling foo
. This doesn't mean that there can't be a MethodError
further down the callchain of course, as a dynamic dispatch isn't generally penetrable to inference, as you rightly point out.
After all, if no method foo(::Number, ::Number)
(or wider, as foo(::Any, ::Any)
also works) exists, it must be a failing test case, since the signature is statically known not to dispatch (barring eval
, but that throws a wrench into everything).
So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.
I'm sympathetic to that idea, but I consider it out of scope for this particular proposal. I think the idea "declare api" extends to other modifications we might want to do with the type system later on quite well; the semantic of "I support these additional guarantees" ought to be possible to tack onto almost anything after all, even after the fact.
EDIT: To be clear, I consider the formalization of type signatures in the type system distinct from methods out of scope; not tying API to signature, as that's exactly what I want to do with per-method API :)
How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number) has to exist; the semantic meaning of @test_api is that no MethodError happens when calling foo.
You can implement
foo(::Real, ::Real) # method 1
foo(::Any, ::Any) # method 2
and that signature is fully covered. The coverage is nominally wider than what we're guaranteeing to be API, but that's OK. In fact it's more than OK: as soon as you think about the implications of trying to actually write @test_api
, this issue is going to come up at every runtime-dispatch boundary. So allowing this at the "entry point" is not really any different than allowing it at runtime dispatch boundaries.
Again, the point is that you cannot implement @test_api
without relying on inference, and when inference fails at runtime dispatch then you have to consider the universe of all methods that might be applicable. Otherwise you haven't proven that you fully support the API you claim to support.
https://timholy.github.io/SnoopCompile.jl/stable/jet/ provides a concrete example, if this is still still unclear.
I am concerned that inference failures are going to mean that we'll frequently run up against bar(::Any)
but we don't propose to make bar(::Any)
guaranteed API. Presumably we can only apply the API guarantee to things that have no (or limited) inference failures. So loading a package seems like it could potentially break the API guarantee, but unless that package tests the universe of all API-guaranteed objects, you might not catch this in the package's tests.
I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human sort!
is API and ht_getindex
is not, than to tell a human sort!(::AbstractVector)
is API and sort!(::AbstractVector, ::Int, ::Int, ::Base.Sort.Algorithm, ::Base.Order.Ordering)
is not. Further, it would be really nice for discoverability if names(Module)
returned exactly all the symbols that are a part of the public API.
Nevertheless, looking for an example where more specificity is needed, let's say I have a method
foo(x, y) = x + y # version 1
and I think I might eventually want to change it to
foo(x, y) = x - (-y) # version 2
Which has slightly different behavior for some input types (e.g. foo(1, typemin(Int16))
). I want to guarantee the behavior on foo(::Int, ::Int)
now and leave the rest as internal. Can I do this & how?
With per-function API this would look like
api foo(x::Int, y::Int) = _foo(x, y)
_foo(x, y) = x + y
One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.
Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about.
@timholy
as soon as you think about the implications of trying to actually write @test_api , this issue is going to come up at every runtime-dispatch boundary. So allowing this at the "entry point" is not really any different than allowing it at runtime dispatch boundaries.
I'm not talking about @test_api foo(::Int, ::Int)
running inference though - it's only checking which signatures are in the method table for foo
and checking that against being able to dispatch (Int, Int)
to it. Not whether the underlying call doesn't error at all further down the callchain, which is in line with the dynamic nature of julia.
Otherwise you haven't proven that you fully support the API you claim to support.
Well yes, @test_api
is not supposed to test whether the entire implementation conforms to the contract. It's not a frontend to a full-stack static analyzer - it's only purpose is to to check whether the initial dispatch to foo
still works, to prevent regressions in dispatch API. There may be MethodError
s thrown from other calls down the chain, as I said above, and that is fine.
There isn't anything about @test_api
that needs to touch inference - it's only about statically knowable information, which methods of the surface call are available (ignoring eval
), not about statically inferring that there are no MethodError
s in the call stack later on. I'm not trying to rebuild JET.jl here.
@LilithHafner
I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human sort! is API and ht_getindex is not, than to tell a human sort!(::AbstractVector) is API and sort!(::AbstractVector, ::Int, ::Int, ::Base.Sort.Algorithm, ::Base.Order.Ordering) is not. Further, it would be really nice for discoverability if names(Module) returned exactly all the symbols that are a part of the public API.
That is a good point, yes - I haven't thought of names
being what lots of people use for API discovery.
I want to guarantee the behavior on foo(::Int, ::Int) now and leave the rest as internal. Can I do this & how?
If you want to guarantee something on a specific set of arguments, there's no way around explicitly giving that guarantee; that's irrespective of whether a single method is api
or not. You have to write foo(x::Int, y::Int)
either way, and give a docstring pointing out the guarantees for both implementations, otherwise people are going to be confused about which argument type combinations have which guarantees. With per-method API, that would look like
"""
foo(::Int, ::Int)
Specialized implementation of `foo` for `Int`. Only relies on addition.
"""
foo(x::Int, y::Int) = x + y
"""
foo(x, y)
Fallback definition of `Foo`. Doesn't assume addition, only requires negation & subtraction.
!!! warning "Mixed Integer Behavior"
Due to julias integer types being asymmetric, this implementation may not always give the same result as same-type `foo`, e.g. `foo(1, typemin(Int16))` differs from `foo(one(Int16), typemin(Int16))`.
"""
foo(x, y) = x - (-y)
If you don't want to provide a fallback definition yet, that's just fine - there's no need to give one if you don't have any guarantees to provide for it yet.
@BioTurboNick
One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.
That's a very good point, but I think per-function API has the same issue: you'll just get the MethodError
on an internal function instead of the API function the user actually tried to call. Worse, through too-permissive dispatch on the internal function, you may get a longer stacktrace that's harder to debug compared to if you got that MethodError
(or an API warning) right at the beginning. I think it's been a point in the past, how MethodError
is hard to debug because of the fact that they can bubble up from way down some callchain.
Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about.
I think that's only true if both API and internal functions are strict in what they expect - which is explicitly counter to typing signatures Any
to allow dynamicness and specialization just do its thing. If you are stricter with types in your signatures, whether the function or method is API begins to loose it's meaning :shrug:
A lot of people, myself among them, have objected to per-method API declarations. Does anyone have an example of a function in the wild for which it is best to declare a per-method API?
Here're 5 random functions, for inspiration
julia> rand(names(Base), 5)
5-element Vector{Symbol}:
:isreadonly
:retry
:sin
:isone
:map
Does anyone have an example of a function in the wild for which it is best to declare a per-method API?
This is not an actual example in the wild, but I feel this is something that can come up when extending functions defined in other packages. For example, let's say I want to define a new number type, called MyNewNumberType
, that I consider public API, on which I have defined:
function Base.sin(::MyNewNumberType)
...
end
However, because Base.sin
is public API from Base
, have I now inadvertently declared this method API by extension? In many cases, this would probably be desired, but maybe I only defined this method for internal use, to pass the object through generic functions for some intermediate calculations. Because it was only defined for internal use, I don't want to be required to maintain the implementation long term, but there isn't a way for me to opt-out of this, if the "API surface" is determined by the source code. This could perhaps be solved by requiring my package to re-declare Base.sin
as API, before my method is actually considered as such, but if I define two types, one for which I want to guarantee support, and one which I don't, then I'm not sure there'd be a way to disambiguate that without per-method support.
I'd like to clarify that my issue with the per-method API was not about the granularity of the feature but rather about the mechanical expression of the syntax being scattered throughout the code, in that I think this is really something that should be centralized similar to export lists.
@BioTurboNick One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.
Isn't this already the state of affairs? I don't think having a method-granular API changes anything; it only exposes what's already true. Plenty of code is written generically enough, and with users ambitious enough, such that package code is often composed in ways that the authors likely did not expect or intend. I think feature-wise the full signature of the method must be considered, otherwise how are packages ever supposed to import/extend external functions to their API?
@LilithHafner
Does anyone have an example of a function in the wild for which it is best to declare a per-method API?
I think we don't have such examples "in the wild" because it's really, REALLY difficult to extend the API of another package in a meaningful way at the moment, if you're not already a developer of that package. I feel that this is because what is considered API is currently not well defined at all, so there just aren't examples for this because noone does this in julia (because it's really, REALLY hard without breaking something). My goal is to make this sort of extension easier, by giving a place to say "this is where you can extend".
@adienes
Isn't this already the state of affairs? I don't think having a method-granular API changes anything; it only exposes what's already true.
Yes, exactly - the issue is that this API surface is not well defined, and that's all I want to tackle with this proposal. By tackling that sort of ambiguity and exposing what's already true programmatically, we can become much more effective at extending the code of other people.
I think feature-wise the full signature of the method must be considered, otherwise how are packages ever supposed to import/extend external functions to their API?
I'd like to clarify that my issue with the per-method API was not about the granularity of the feature but rather about the mechanical expression of the syntax being scattered throughout the code, in that I think this is really something that should be centralized similar to export lists.
Absolutely! While I personally prefer writing per-method API close to the definition, actually doing that is not as important as being able to declare per-method API in the first place, as well as being able to test for accidental regressions in that.
I would still be curious to hear more detailed feedback for https://github.com/JuliaLang/julia/issues/49973#issuecomment-1569093036 . is the main concern that it is too verbose / unwieldy? from my POV, it does seem to provide
interface
keyword there only exists to handle the where
clause and is probably redundant. this could almost as easily be a magic file __api__.jl
containing a NamedTuple
is the main concern that it is too verbose / unwieldy?
Not necessarily, it's just that I have another concrete proposal for that kind of thing already in the works (parts of which certainly shone through in some of my comments above). On top of this being a bit orthogonal to the whole "How do I mark something as supported" discussion this proposal is about (which, as mentioned above, should be perfectly compatible with whatever type-theoretic additions we might want to do later on), I didn't want to derail this discussion with another long-form text.
As such a change is much larger than the one discussed here, I think this proposal has better chances of being accepted (one way or another) without trying to solve everything at once. That's why I've structured it in such a way as to be easily extendeable for any new language/type-level concept we may want to do in addition to this.
I certainly did not intend to derail the thread; it might have been more appropriate to continue this thread as a discussion on your concrete proposal and use a separate one for brainstorming, my apologies.
I do want to confirm / be clear on something though: what I had in mind doesn't change or extend the type system at all, right? Possibly the word interface
was a poor choice there, but I only meant that as an analogy. Ultimately the proposal is more or less "just" a more structured export list, where the exports can be scoped, tagged with experimental
etc, and given the full method signature (within the confines of the existing type system)
All good - my assumption was that users would be able to dispatch on these kinds of interfaces; if they can't do that, what advantage over current trait implementations/compiler guarantees/diagnostics do we hope to gain from them? After all, if we want to be able to say "This is the interface I support", it would be nice to dispatch on that fact as well, ala "This is the interface this function requires for this argument". This requires extension of the type system though, which is where things can get quite tricky (and is why I've deliberately left the grouping of functions/methods & structs into one unit out of this proposal - this is purely about "I want to give some stability guarantee for some thing" ;) ).
I do have thoughts/angles at solutions for these, but those ideas are not quite ripe for sharing yet, I don't think.
I'm strongly opposed to calling this a
scoped export
, because it does nothing of the sort.
This seems like a pretty strong interpretation of the word export to mean exactly what it already means in Julia in terms of symbol visibility.
But if you look up the more general definitions (https://en.wiktionary.org/wiki/export#Verb) you'll things like
We're allowed to change the meaning of words by adding adverbs like scoped
if that's a good analogy and users understand what it means. I think scoped export
is clear enough from this point of view and it reuses an existing keyword which is nice.
I think scoped export is clear enough from this point of view and it reuses an existing keyword which is nice.
This is where I disagree - reusing an existing keyword (export
) with a clear meaning in the context of julia ("make this symbol available without explicitly importing it") to mean something entirely different ("This symbol is supported as API"), even if modified with an adverb, is a bad thing. I really don't think the other general meanings of the word are relevant there; all of them are about moving something from place A to place B, which "This symbol is supported as API" just doesn't do.
because it's really, REALLY difficult to extend the API of another package in a meaningful way at the moment, if you're not already a developer of that package.
I would like to hear a concrete description of why this is hard at the moment, because this has not been my experience. What has been my experience, though, is that there is a lack of consensus on how to mark something as supported/API, which I think is a very good issue to tackle. I also think that this issue for me boils down to:
Base.names
).The hard part I'm referring to is not making things work somehow (that's usually fairly easy - just overload things until you don't get MethodError
anymore and/or you get a result that looks right), but making it work without accidentally relying on internals or undocumented behavior, as well as actually following the semantics/guarantees the interface requires.
My pet example for this is the Random stdlib. While it's good that there's documentation for the API surface, I rarely see the API used correctly; that is, people don't write a method for Random.rand(rng::AbstractRNG, ::Random.SamplerType{MyType})
for creating random instances of their type (or even defining a custom sampler type..), but rather write a method like Random.rand(rng::AbstractRNG, ::Type{<:MyType})
, which is not at all supported by the interface; it just happens to work because the surface-level API is the same. This extends to the ecosystem, where e.g. (without wanting to pick on anyone in particular) RandomNumbers.jl gets it wrong (others get it right, like StableRNGs). Distributions.jl gets it half-right - they do define their own sampler objects, but then don't hook into the existing API by subtyping Base.Sampler
, but rather punning on the rand
name to mean "sample from this distribution" in their own type hierarchy. I've also had some extensive discussion with @cscherrer about this particular thing on Slack & in one of his packages, but I can't remember which it was (sorry!).
Similar issues can occur in other codebases, where implementing an API is more complicated than simply overloading a single function.
Things like that, where rand
is (in general) API, but specific implementations are not, is what I think are relevant here :shrug:
but making it work without accidentally relying on internals or undocumented behavior
I am struggling to understand the scope of this proposal. On one hand it seems to be close-to-trivial, just a way to "tag" things essentially as an adjunct to documentation. That has some merit on its own. On the other hand it seems to be an attempt to provide machine-checkable guarantees on dependencies. The latter seems quite exciting but I worry that it gets into serious technical difficulties, because I don't know how you provide such guarantees without diving into the implementation of the API using inference. And lots and lots of Julia code has barriers to inference.
Most likely, I'm just having a hard time seeing your vision. But both the OP and the discussion have focused a lot on what we name things and what kinds of objects can receive that name. I'm far more interested in the "Uses" and "Required Implementation Steps" in the OP, but those seem far less fleshed out and haven't gotten as much discussion. My concern is that all this discussion about naming is a bit premature because it's not obvious to me that technical issues won't end up dominating the actual benefits (or lack thereof) from this proposal.
If this discussion is getting somewhere, by all means continue it. But personally I am not seeing something that looks like progress. If this were my proposal, at this point I'd start writing @test_api
, and once I'd made substantial inroads I'd share the draft and report back. You could presumably generate test cases using a simple "cheat": any object that isn't API could get a notapi
somewhere in its name, and thus you could fake your api-check without yet having gone to the trouble of modifying Julia's internals. That will let you develop the hard parts of this proposal while deferring the painful parts of this proposal; we'll all have a clearer vision for what you can do with this new tagging system before having to invest huge effort in it.
On the other hand it seems to be an attempt to provide machine-checkable guarantees on dependencies.
Let me reiterate then: I do NOT want to propose a machine checkable guarantee on dependencies. All I want is to have tools like Documenter or the REPL being able to consistently say "this thing is considered API by the package developer. Read the docstring for more information!". Nowhere have I mentioned wanting to do machine checkable proofs of callstacks with this proposal; I actively consider that out of scope - it's something that's been thrust onto this proposal, but not by me. The reason I've stated the guarantees & rules in the proposal is because that's what needs to be followed at minimum by a developer to not release a breaking change on something that they consider API. How they check that this is the case is their concern - one possibility is the addition of that @test_api
macro.
If my goal was to have machine checkable proofs, I would have written that in the Uses
section, as I'm well aware how much work that is. This proposal, as I keep repeating, is purely about marking things a developer says "I want to support this in SemVer". No part of this proposal is about the semantics of that support (though the rules contained in the OP are the bare minimum requirements such that a change in functionality is not a breaking change, whether with this proposal or without it).
I'm by no means suggesting that this will solve API misuses or incompatibility issues - not at all! But it WILL make it easier for a user to know what the actual API surface is, as well as give developers a consistent avenue to say "No, sorry, your use is out of scope/support".
Great, that helps! But I still don't understand how you get some of the benefits you claim, notably:
Tree-shaking for Pkg images/static compilation, to only make api bindings available in the final image/binary/shared object
But I still don't understand how you get some of the benefits you claim, notably:
That is written primarily with (future) static compilation to a shared object in mind - if a function/method/global variable is marked as api
, we can take that as a sign that the name/symbol should not be mangled, as it's intended to be part of the API of the codebase.
That's why I keep going on and on about the intent of a developer - by giving developers the possibility to signal their intent, in a consistent manner on the language level, we can use that intent in lots of places that touch the API surface, or are related to handling the API surface.
I do realise that this is a bit of a "head in the clouds, maybe future stuff" approach, by daring to imagine what could be done with this :shrug: I have no illusions that machine-checkable proofs of API surface mean anything when crossing an FFI surface.
Ah, this is just about naming in the shared library! OK, I thought it was about inclusion/exclusion of compiled code, which is a much harder problem.
Yeah, in hindsight, I could have worded that better in the original post :joy: I'll edit it in!
Another clarification: for linting hints, what is your proposal for ambiguous cases? Is it better to have false positives (warn people that foo(x, y)
might not be API if a subset of 2-arg methods of foo
aren't marked api
) or false negatives (assume because at least one 2-arg api
-tagged method exists that this is probably OK)?
If you're not using inference, per-method annotation seems to require that you plan for one of these outcomes.
Since linting julia code in terms of types always requires inference to not warn about potential MethodError
s erronously anyway, I'm inclined to have false positives, following the behavior for MethodError
. If using inference is not an option, I'd err on the side of caution and rather have false positives - but this may be just my personal preference to get some form of type inference back as soon as possible, even in type unstable code.
The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.
Please do reply if you have something to add/comment on!
Motivation
The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:
@deprecate
is hard to use correctly and often not used at all.This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.
There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.
The
api
keywordThe main proposal for user-facing interactions with declaring whether an object is the new
api
keyword. The keyword can be placed in front of definitions in global scope, likestruct
,abstract type
,module
,function
andconst
/type annotated global variables. Usingapi
is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.
The following sections explain how
api
interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.function
Consider this example:
This declares the function
foo
, with a single method taking two arguments of any type. Theapi
keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like sothe new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.
Being declared on a per-method case means the following:
api
MAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidentalMethodError
s where there were none before.api
MAY NOT introduce an error where there was none before, without that being done in a breaking release.api
MAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this toMAY NOT
if they feel it appropriate for their function/method.api
MAY remove an error and introduce a non-throwing result.api
MAY change one error type to another, but developers are free to strengthen this toMAY NOT
if they feel it appropriate for their function.This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.
Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax
declares ALL methods of
bar
to be public API as an escape hatch - i.e., the above syntax declares the functionbar
to be API, not just individual methods. In Base-internal parlance, theapi
keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole::Vararg{Any}
argument and marking that as API.struct
The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.
Consider this example:
A struct like the above annotated with
api
guarantees that the default constructor methods are marked asapi
. The subtyping relationship is considered API under semver, up to and includingAny
. The existence of the fieldsa
andb
are considered API, as well as their relationship to the type parametersT
andS
.In the example above, the full chain
MyStruct{T,S} <: AbstractFoo <: Any
is considered API under semver, which means that methods declared as taking either anAbstractFoo
or anAny
argument must continue to also take objects of typeMyStruct
. This means that changing a definition like the one above into one like thisis a breaking change under semver. It is however legal to do the following:
because the old subtyping chain
MyStruct{T,S} <: AbstractFoo <: Any
is a subchain of the new chainMyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any
. That is, it is legal to grow the subtyping chain downwards.Notably, making
MyStruct
API does not mean thatAbstractFoo
itself is API, i.e. adding new subtypes toAbstractBar
is not supported and is not considered API purely by annotating a subtype asAPI
.Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on
MyStruct
as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things likeMyStruct{S,T}
in structs containing objects of this type type unstable).In regards to whether field access is considered API or not, it is possible to annotate individual fields as
api
:This requires the main struct to be annotated as
api
as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of typeMyStruct
viagetfield(::MyStruct, :b)
orgetproperty(::MyStruct, :b)
is covered under semver and considered API. The same is not true of the fielda
, its type or the connection to the first type parameter, the layout ofMyStruct
or the internal padding bytes that may be inserted into instances ofMyStruct
.abstract type
abstract type
behaves similarly tostruct
, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.Consider this example:
The following changes do not require a breaking version:
The following changes require a new breaking version:
What the
api
keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods likeapi function foo(a::MyAbstract)
are expected to work with new objectsMyConcrete <: MyAbstract
defined by a user, but methods likefunction bar(a::MyAbstract)
(note the lack ofapi
) are not.At the same time, a lack of
api
can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.type annotated/
const
global variablesMarking a global
const
or type annotated variable asapi
means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while forconst
global variables only reading is considered API, writing is not API.The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.
Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write
api Bar::Any = ...
.It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as
api
.module
Annotating an entire
module
expression withapi
means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.Consider this example:
In this example,
Foo
,Foo.f
,Foo.Bar
,Foo.Bar.baz
are considered API, whileFoo.Bar.bak
is not.Consider this other example:
In this example,
Foo.g
,Foo.Bar
,Foo.Bar.baz
andFoo.Bar.bak
are considered API, whileFoo
andFoo.f
are not.Consider this third example:
Only
Foo.Bar.baz
is considered API, the other names in and fromFoo
andFoo.Bar
are not.Uses
This is a list of imagined uses of this functionality:
api
bindings available in the final image/binary/shared objectapi
marker could be used for not mangling the names of julia functions when compiling an.so
, as currently all names are mangled by default (unless marked as@ccallable
, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).Do you have ideas? Mention them and I'll edit them in here!
Required Implementation Steps
api
keyword in the correct places and produce an expression the compiler can use later onapi
marker handling to methods and the method table implementation, as well as to binding lookup in moduleshelp>
mode aware ofapi
tags.api
Known Difficulties with the proposal
Expr(:function)
does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a newExpr(:api_function)
. Analogous issues exist forExpr(:struct)
,Expr(:=)
,Expr(:module)
etc. Both approaches potentially require macros to be changed, to be aware ofapi
in their expansion.FAQ
Method
and other (internal?) objects ofBase
, exposing this as a macro would also mean exposing this as an API to the runtime, even though thisapi
distinction is not about dynamicness - theapi
surface of a package really ought to be fixed in a given version, and not change dynamically at runtime.private
instead?private
,internal
or similar instead ofapi
means that any time a developer accidentally forgets to add that modifier means a technically breaking change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.public
?public
overloads this already overloaded term in the wider programming community too much.public
/private
are commonly associated with access modifiers, which is decidedly not what this proposal is about.api
, which makes its intent very clear. It would also be prudent to have that discussion after we've come to a compromise about the desired semantics.export
?export
is a bit tricky, since it doesn't distinguish between methods the wayapi
does. I think it could work to mark allexport
ed symbols withapi
as well (this is certainly not without its own pitfallse..), though I also think thatexport
is a bit of an orthogonal concept toapi
, due to the former being about namespacing and the latter being exclusively about what is considered to be actually supported. I think a good example is the waysave
/load
are implemented with FileIO.jl. While the parent interface package exportssave
andload
, packages wishing to register a new file format define new, private functions for these and register those on loading with FileIO (or FileIO calls into them if they're in the environment). This means thatMyPkg.save
is not exported fromMyPkg
, but is nevertheless a supported API provided byMyPkg
. The intention is to support these kinds of usecases, whereexport
is undesirable for various reasons, while still wishing to provide a documented/supported API surface to a package.I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.