JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.54k stars 5.47k forks source link

Taking API surface seriously: Proposing Syntax for declaring API #49973

Open Seelengrab opened 1 year ago

Seelengrab commented 1 year ago

The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.

Please do reply if you have something to add/comment on!

Motivation

The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:

This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.

There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.

The api keyword

The main proposal for user-facing interactions with declaring whether an object is the new api keyword. The keyword can be placed in front of definitions in global scope, like struct, abstract type, module, function and const/type annotated global variables. Using api is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.

When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.

The following sections explain how api interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.

function

Consider this example:

api function foo(arg1, arg2)
   arg1 + arg2
end

# equivalently:
# api foo(arg1, arg2) = arg1 + arg2

This declares the function foo, with a single method taking two arguments of any type. The api keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like so

function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end

the new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.

Being declared on a per-method case means the following:

This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.

Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax

api function bar end

declares ALL methods of bar to be public API as an escape hatch - i.e., the above syntax declares the function bar to be API, not just individual methods. In Base-internal parlance, the api keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole ::Vararg{Any} argument and marking that as API.

struct

The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.

Consider this example:

abstract type AbstractFoo end

api struct MyStruct{T, S} <: AbstractFoo
    a::T
    b::S
end

A struct like the above annotated with api guarantees that the default constructor methods are marked as api. The subtyping relationship is considered API under semver, up to and including Any. The existence of the fields a and b are considered API, as well as their relationship to the type parameters T and S.

In the example above, the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver, which means that methods declared as taking either an AbstractFoo or an Any argument must continue to also take objects of type MyStruct. This means that changing a definition like the one above into one like this

abstract type AbstractBar end
abstract type AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

is a breaking change under semver. It is however legal to do the following:

abstract type AbstractFoo end
abstract type AbstractBar <: AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

because the old subtyping chain MyStruct{T,S} <: AbstractFoo <: Any is a subchain of the new chain MyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any. That is, it is legal to grow the subtyping chain downwards.

Notably, making MyStruct API does not mean that AbstractFoo itself is API, i.e. adding new subtypes to AbstractBar is not supported and is not considered API purely by annotating a subtype as API.

Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on MyStruct as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things like MyStruct{S,T} in structs containing objects of this type type unstable).

In regards to whether field access is considered API or not, it is possible to annotate individual fields as api:

api struct MyStruct{T,S}
    a::T
    api b::S
end

This requires the main struct to be annotated as api as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of type MyStruct via getfield(::MyStruct, :b) or getproperty(::MyStruct, :b) is covered under semver and considered API. The same is not true of the field a, its type or the connection to the first type parameter, the layout of MyStruct or the internal padding bytes that may be inserted into instances of MyStruct.

abstract type

abstract type behaves similarly to struct, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.

Consider this example:

abstract type AbstractBar end
api abstract type MyAbstract <: AbstractBar end
# MyAbstract <: AbstractBar <: Any

The following changes do not require a breaking version:

# introducing a supertype
abstract type AbstractBar end
abstract type AbstractFoo <: AbstractBar end
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractFoo <: AbstractBar <: Any

The following changes require a new breaking version:

# removing a supertype
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: <: Any
# removing the `api` keyword
abstract type AbstractBar end
abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractBar <: Any

What the api keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods like api function foo(a::MyAbstract) are expected to work with new objects MyConcrete <: MyAbstract defined by a user, but methods like function bar(a::MyAbstract) (note the lack of api) are not.

At the same time, a lack of api can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.

type annotated/const global variables

api MyFoo::Foo = ...
api const MyBar = ...

Marking a global const or type annotated variable as api means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while for const global variables only reading is considered API, writing is not API.

The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write api Bar::Any = ....

It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as api.

module

Annotating an entire module expression with api means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.

Consider this example:

api module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

In this example, Foo, Foo.f, Foo.Bar, Foo.Bar.baz are considered API, while Foo.Bar.bak is not.

Consider this other example:

module Foo
    api module Bar
      const baz = 1
      const bak = 42
    end

    f() = "hello"
    api g() = 
end

In this example, Foo.g, Foo.Bar, Foo.Bar.baz and Foo.Bar.bak are considered API, while Foo and Foo.f are not.

Consider this third example:

module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

Only Foo.Bar.baz is considered API, the other names in and from Foo and Foo.Bar are not.

Uses

This is a list of imagined uses of this functionality:

Do you have ideas? Mention them and I'll edit them in here!

Required Implementation Steps

Known Difficulties with the proposal

FAQ


I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.

c42f commented 1 year ago

I think the motivation here is solid! Well said :-)

I don't so much like the feel like the proposed api keyword directly in front of syntactic constructs. Basically for similar reasons that I'm against the same thing for export, as expressed here https://github.com/JuliaLang/julia/issues/8005#issuecomment-660077430

Also I don't understand why api would make sense as a per-method thing rather than a per-function thing. It's generic functions which are named/called by the user's code, not individual methods. So the entire generic function should have the same API status.

Did you consider the alternative of having the API declared separately, in a similar way to how the export list currently is? Something more with the flavor of the proposal in https://github.com/JuliaLang/julia/issues/30204?

For example, we could have scoped export:

scoped export A, B

The idea is to export A and B as public API, but keep their names "scoped" within the namespace of the exporting module.

A nice thing about this is that it could be implemented as a contextual keyword: the scoped would only be a keyword when followed by export in the same way that mutable is not a keyword unless followed by struct. Thus not breaking any existing code. It would also be easy to support in Compat.jl as @scoped export A, B (which would just return export A, B on older Julia versions, and would retain the usual special case parsing rules for export lists).

Regardless of my quibbles, I think this is an important issue and it's great that you've put the effort into writing up your thoughts. Thank you!

Seelengrab commented 1 year ago

Did you consider the alternative of having the API declared separately, in a similar way to how the export list currently is?

I explicitly haven't mentioned other proposals because I want to see how people react to this taken at face value - not to mention that I think most other ideas in this space are a bit underspecified in what exactly the desired semantics are. I think that's supported by the amount of discussion around what the various PRs & issues mean & want, which is what I attempted to give here directly. Most of the proposal follows from the requirements imposed by "what is a breaking change" when looking at functions, methods, variables etc from the POV of a user upgrading a version of a package.

My reasoning for placing api directly in front, rather than as an explicit list, is that it gives a small incentive/reminder to be careful when working on that code. The whole point is to be more explicit & thoughtful with what could be breaking changes, and I think having the API marker at the place of definition helps with that (it also makes it obvious that there's an undocumented API thing, which makes it extremely easy for newcomers to come in with a docstring PR, since they then "only" have to figure out what a function is doing to be able to document it, instead of having to figure out that there's an undocumented thing in the first place).

Another reason is that api is a relatively weak marker, giving only the most necessary semantics required from the type system (in case of methods/dispatch/structs/abstract types). For any more details what api means in the context of a given package, a docstring giving the exact surface is required - otherwise, every single bit of observable behavior could be taken as supported API, meaning you suddenly couldn't change the time complexity of a function anymore, for example.

And a third reason - if the whole function is always considered API, what about new methods added from outside of the package defining the function? Are they also automatically considered API, and who supports them? That's not something you can do with export - you can't export new names from a foreign module (without eval, but that breaks everything anyway). I think it's better to be a bit more conservative here - while the symbol is important in terms of what is directly accessible at all, often not the entire method surface is considered API, and while there could be a an argument made for subtly enforcing forwards to actual internal functions, I think that's quite an unnecessary indirection, requiring possible duplication of docs to devdocs, duplication of (future) additions in permitted dispatch, taking even more care to double check that you don't accidentally violate an invariant guaranteed by the parent caller etc. This, to me, feels like too much additional boilerplate, when we already have dispatch to distinguish these cases from each other - so why not expose that in API as well?

Finally, I don't think it's difficult per se to add something like

api foo, bar, baz

as a syntax for an API list, marking all of these symbols as API. The difficulty lies in the semantics - what the semantics of that api marker are depends strongly on what foo, bar and baz are, and so using such a list requires re-inspection into what foo etc are, at a time when they (potentially) aren't even defined yet (and requires users to jump around in the source code while writing docs - something I'd like to avoid for dev-UX reasons). So I guess that's a fourth reason why I prefer local api - unlike namespacing of symbols, which is a global property due to the symbol being unique, api really has local differences in meaning & requirements. Not to mention that Jeff already expressed dislike for having "more than one way" of writing the same thing when talking about export - maybe he feels differently about api, but we'll have to wait for him to chime in to know that :)

Apologies, this got longer than expected - thanks for asking about it!

A nice thing about this is that it could be implemented as a contextual keyword:

This proposal can also be done like that, no? I've given an explicit list of places where writing api would be legal, after all :) All of them are currently syntax errors, so adding this as a contextual keyword shouldn't break any existing code:

julia> api function foo end
ERROR: syntax: extra token "function" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api module Foo end
ERROR: syntax: extra token "module" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api const bar = 1
ERROR: syntax: extra token "const" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api abstract type MyAbstract end
ERROR: syntax: extra token "abstract" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api struct Foo end
ERROR: syntax: extra token "struct" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api foo::Int = 1
ERROR: syntax: extra token "foo" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

Thank you for taking the time to read through the proposal! I really appreciate your thoughts, especially because of your work on JuliaSyntax. If the scheme parser weren't so hard to jump into and hack on, this might have already been a draft PR instead of a proposal :)

martinholters commented 1 year ago

I'm a bit skeptical about the per-method thing. At least it needs some good thinking about what would be breaking. Coming from above example

api function foo(arg1, arg2)
   arg1 + arg2
end

A user can now happily do foo(1, 2.0). But the package in a new version now adds

function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end

The same call as before is now accessing an internal method.

Seelengrab commented 1 year ago

The same call as before is now accessing an internal method.

Yes, that is a good example for an accidental breaking change, of the sort described in the proposal that really must not happen. I realize now that I've only written about the case of removing foo(arg1, arg2), but of course this applies to adding new dispatches as well. I can add something like this to the list of "MUST"s, if you think that's correct:

I think there is room for having this testable as part of a testsuite - for example, one could take the declared signature of all api methods of foo in the last commit, and compare that against the dispatches of those signatures in all methods of foo in the new commit. If any signature would dispatch to a non-api marked method in the new commit, you have a testfailure due to an accidental breaking change. This does have the downside of requiring the testsuite to have awareness of being a testsuite and being able to check against a known-good version, but IMO that's something we should do anyway - we currently just do it (badly) in our heads when writing the code.

Another way this could be done without checking past versions is have something of the sort of

@test_api foo(::Number, ::Number)

to check whether the given signature (and its subtypes) dispatches to an api method or not. I feel like there's also room for improvement there, but I haven't thought about that part too deeply yet (compared to the other parts of the proposal anyway).

rfourquet commented 1 year ago

I don't have much insignt into the merits of marking api per-method vs per-function, but I fear that per-method would be too complex and lead many devs to just don't bother with api because of cognitive overhead. On the other hand, the rule for per-function seems straightforward enough: "an api function must not break use cases allowed by the attached docstring". And for those motivated, it seems the code can always be re-organized to make "non api methods" unavailable by renaming them with e.g. a leading underscore.

Seelengrab commented 1 year ago

Well conversely, the api function foo end functionality would already give that exact behavior :shrug: I think a per-method annotation could be a boon, as described above, but I'm willing to drop that if others feel that it's too much mental overhead (though I'll gripe about this, since it'll inevitably lead to more accidental breakages down the road :stuck_out_tongue: the complexity of API surface doesn't vanish by marking the whole function as api after all, it just hides it away until it becomes an issue).

In either case, I think it's good that this discussion is happening - keep the feedback coming!

dalum commented 1 year ago

Thanks for a well-written proposal! I 100% agree that there is an issue with a lack of consensus around what makes an API of a package, and that it would be nice to have a rigid way to specify this.

However, first, in the bikeshed department, I don't like the keyword api. Unlike all other keywords, it's a lowercase form of an abbreviation, and I'd much rather see a keyword that isn't pronounced letter-by-letter. Or maybe this would be the motivation to start pronouncing it "ay-pai". I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API:

"""@API
    my_function(args...)

This is a nicely documented function that does what you expect.
"""
function my_function(args...) end

Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes. It is often very convenient to define a function like my_public_function(::Any, ::Any), which then, in principle, has an API surface covering anything the user throws at it that doesn't error, but in practice, and as perhaps written in its docstring, it requires some trait-like behaviour. I can definitely foresee situations where my_public_function would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed. While it is, in a sense, problematic that the user code breaks in this case, I think a strength of Julia comes from actually being a bit fuzzy here. However, if I understand the semantics you proposed correctly, this would be considered an accidental breaking change that should "never happen", which I guess I then disagree with.

Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.

Finally, like with export, I do not think it is useful to think of the API at a method level. I think there has been a fairly strong push in Julia to make sure that a given function has just one meaning, which means that methods are simply implementation details on how to implement that functionality for a given set of types. I wouldn't want to propose a convention where only specific methods are considered part of the public API. Something feels off to me to think about my_function being public if you call it with types X and Y, but internal if you call it with Z. This does clash a bit with declaring the API in the docstring, but that might not be important.

Seelengrab commented 1 year ago

Thank you for your time and discussing the proposal!

However, first, in the bikeshed department, I don't like the keyword api.

The bikeshed is noted, but barring an alternative.. :shrug: I'm only partial to api myself because it's a ridicolously well-known concept that fits the intended meaning precisely, as well as being quick to type and unambiguous with other keywords (tab completion can then work its magic wonders). I do agree that it being an abbreviation is kind of bad though. Suggestions are welcome! :)

I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API:

I originally considered proposing an API-string macro, used like so:

api"""
    my_function(args...)

This is a nicely documented function that does what you expect.
"""
function my_function(args...) end

and implemented like

struct APIMarker
    doc::String
end

macro api_str(s::String)
     APIMarker(s)
end

which would give any documentation rendering the ability to distinguish API from non-API. The issue with that or with placing the marker in a docstring is that you now need to parse & process the docstring to be able to say "this is API" for doing code analysis. It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.

Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes.

That whole paragraph is a very good point, and I have thought about that! There's also a subtlety that I forgot to put into my original proposal, so let me take this opportunity to reply and add that subtlety here as well, by lieu of an example.

Consider a function with a single method like

function foo(arg1, arg2)
   arg1 + arg2
end

that we say is API of some package. It's signature is Tuple{typeof(foo), Any, Any}, and taken at face value, we might think that we can call foo with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1 and arg2 to also have a method on +, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}. Neither exporting a whole function name, nor marking a single method as api can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea). This is quite a bit more powerful a requirement than even Haskell does with its typeclasses, I think, though I don't have a proof of that. While typing arguments as Any is tempting and can often help with interoperability, it's really unfortunate that these kinds of implicit constraints in reality constrain Any further, so it wasn't ever going to work with truly Anything anyway. I guess the "hot take" here is that any API function that claims to take Any is a lie, and shouldn't be trusted (well, other than the identity function - at least, I haven't seen any object that couldn't be returned from a function. Sounds like a fun challenge..).

So, barring being able to express those kinds of constraints, the next best thing is to say "well, I support this signature, provided you implement functions X, Y, Z with arguments W, V, S on your type T" in a docstring. The foo(::Any, ::Any) example is just the most extreme case of this; the use case I had in mind when writing this was something along the lines of

api function foo(x::MyAbstract, y)
    2x+3y
end

and being able to express "I only support foo with a first argument that's obeying the constraints of MyAbstract", effectively transferring the constraint of *(::Int, ::MyAbstract) to the contract/api that MyAbstract provides/that subtypes of MyAbstract are required to provide. From my POV, an abstract type is already what we'd call an informal "interface" in other languages - they just don't compose at all, due to only being able to subtype a single abstract type, leading to the proliferation of Any in signatures (which ends up hiding the true required surface).

I can definitely foresee situations where my_public_function would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed.

Well, I don't per se see an issue with that - not every implementation detail is part of the API of a function. While some implicit constraints on the arguments can be, in practice those may be considered an implementation detail, and if they are, it's fine if non-breaking change (according to the docstring/additional documentation of the method) the breaks some user code that relied on them. The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to Any.

Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.

I'd implement that with this proposal by having an inner module Experimental, like Base.Experimental, which isn't marked as api (perhaps individual functions/methods inside of that module are marked API though, to make clear which part of the Experimental module is considered part of the experimental API and which parts are just implementation details of the experimental functionality). The reasoning for this is that experimental features, by definition, can't be relied upon under semver, as they may be removed at any time, so they must not be marked as part of the API surface of the package. Mixing experimental API into the same namespace as stable API seems to me like a way to more accidental breakages, through people assuming that something being marked as soft-API will be stable at some point, so it's fine to rely on it (which is not the case - usage of experimental functionality needs to be guarded by version checks, just like any other use of internal functionality).

Something feels off to me to think about my_function being public if you call it with types X and Y, but internal if you call it with Z.

Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo that forward their Any typed arguments to _foo or _foo_impl where the actual disambiguation through dispatch (with MethodError to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.

In part, the "API per method" section of the proposal stems from my (perhaps irrational) fear that people are going to slap api on everything to make the warnings go away, and then being undisciplined with actually providing the guarantees they claim to give with that marker. I think this issue already exists today, but it's just masked by noone being able to consistently tell whether something is part of the API of a package or not - my gut says that a surprising amount of code runs on the "happy path" and always uses the latest version of things anyway, but the few times a big upgrade in a version from 1-2 year old code and subsequent breakage of that code came up on slack just got stuck in my mind. If people were able to consistently rely on some given methods being API and developers were able to actually accurately test that their API surface on a method-by-method basis doesn't change, there'd be less of a risk of such breakage occuring. But again, that may be an irrational fear of mine :shrug:

dalum commented 1 year ago

It's signature is Tuple{typeof(foo), Any, Any}, and taken at face value, we might think that we can call foo with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1 and arg2 to also have a method on +, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}. Neither exporting a whole function name, nor marking a single method as api can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea).

I appreciate your point, but I disagree with the analysis. I could define a function:

"""
    multiply_by_two(x)

Return `x` multiplied by two.
""""
multiply_by_two(x) = x

This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states. An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.

It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.

As my point above, I don't see any compelling argument for why this shouldn't be allowed to change at runtime, though. As I see it, having API declarations in the code is a feature that is purely intended to help users (and potentially linters). Intentionally making it less flexible than the rest of Julia's documentation system seems to be in stark contrast to Julia's dynamic nature, and I think analysers will be able to overcome the docstring-parsing requirement.

The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to Any.

While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.

Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo that forward their Any typed arguments to _foo or _foo_impl where the actual disambiguation through dispatch (with MethodError to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.

I also think it would be really nice to have all methods exposed at the surface-level of the dispatch tree, but there are a lot of practical reasons for having function indirection. Often it's with an args... capture (Any-typed), etc., and it'd be very inconvenient to have to explicitly type those in every method.

Seelengrab commented 1 year ago

This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states.

Right, and I'm not claiming that api (or the compiler) could or should be a total, machine readable description of all the invariants a function requires to be called - that would indeed be impractical and against the dynamic nature of julia. What I am saying/proposing is that the ability for a developer to declare "multiply_by_two is part of the API of my package" for a given version on a method-by-method basis is valuable; whether the actual implementation matches the desired behavior for a function is a different matter. Checking that is the job of a testsuite, not some declaration about what a developers sees as public API for their package (which is the only thing I'm concerned about here - giving developers the ability to say "this is what I support", in a consistent, Base supported manner. As such, it IMO needs to be flexible enough to accomodate granularity).

An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.

I disagree; that's exactly what a testsuite is. A docstring is the prose describing the desired behavior, and a testsuite is the code checking that the functional implementation matches the prose.

While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.

I agree with you here, this is not currently a desirable thing to do because there's only one possible direct abstract supertype. That's still an orthogonal concern to whether or not a single method is considered API or not though. Maybe that's an argument for holding off on method-level-API until such a time when we can express these kinds of constraints, of saying "I require the invariants these 2+ abstract types provide", better?

adienes commented 1 year ago

personally, I would not prefer the syntax be a modifier to existing definitions like api function ... because then the declarations are still scattered throughout the code. the primary problem for me is discoverability, which is maximized when the interface(s) are more or less centralized, or in some very clear hierarchical structure

an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl

Mockup below with a new syntax interface block operating kind of like a header file, with arbitrary "tags" for API. the only information given about the functions is the names and signatures. Hopefully this should provide enough information for good tooling ? anything more substantive should probably just go in a docstring. interface could also support the where keyword so that methods can be made api with as much specificity as desired. also multiple declarations could be made so the entire interface doesn't have to fit in a single format if that pattern is desired. Another benefit of declaring it this way is that things can be made api conditionally via package extensions

module Animals
    export speak, Dogs

    abstract type Animal end
    speak(::Animal) = "Hello, World!"
    speak(::Animal, s <: AbstractString) = "Hello, $s !"

    module Dogs
        using ..Animals

        struct Dog <: Animals.Animal end
        Animals.speak(::Dog) = "woof"

        bark(d::Dog) = Animals.speak(d)
    end

    module Cats
        using ..Animals

        struct Cat <: Animals.Animal end

        Animals.speak(::Cat) = "meow"
        fight(::Cat, ::Cat) = String["hiss", "screech"]
        getalong(::Cat, ::Cat) = nothing
    end
end

interface Animals where T <: AbstractString
    :exported => (
        speak => Tuple{Animal} => String,
        speak => Tuple{Animal, T} => String,
        Dogs => bark => Tuple{Dog} => String
    ),
    :stable => (
        Animals => Cats => speak => Tuple{Cat} => String
    )
end

interface Animals
    :experimental => (
        Animals => Cats => fight => Tuple{Cat, Cat} => Vector{String}
    ),
    :deprecated => (
        Animals => Cats => getalong => Tuple{Cat, Cat} => Nothing
    )
end
adienes commented 1 year ago

also, possibly this discussion should be folded together with https://github.com/JuliaLang/julia/discussions/48819 ?

Seelengrab commented 1 year ago

an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl

I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all - this is purely for allowing developers to declare their existing API, without having to migrate to new concepts in terms of what they want to develop.

Also, while I can see a path for something like interfaces to be a thing, I don't think the approach in Interfaces.jl is right, in particular because it's a big disruption to the ecosystem due to the amount of "legacy" code we have. I have a different idea for how to tackle that (mostly based on type theoretic consequences/additions to the current type system), which does integrate with this proposal nicely without special cases while keeping code churn as small as possible (not larger than the code churn produced by this proposal, in fact), but that's for a different discussion.

also, possibly this discussion should be folded together with https://github.com/JuliaLang/julia/discussions/48819 ?

While that discussion is the first time I wrote down parts of the ideas contained in this proposal, it's not a concrete discussion about a concrete proposal. Thus, I think this issue is more appropriate to talk about this proposal at hand, instead of the larger discussion whether we want something like access modifiers at all. So far, the only major gripe with this proposal seems to be the per-method-API and some minor bikeshedding on the name/syntax to expose this (which I think I have argued sufficiently now why it ought to be syntax, and not dynamic information).

adienes commented 1 year ago

I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all

The only similarity is superficial. Otherwise, the "interface" is just a tuple of pairs without introducing any new type concepts. It almost doesn't require syntax at all; if modules were types one could just dispatch on some function interface(m::MyModule)

BioTurboNick commented 1 year ago

IMO the less needed by package developers to convert the current code to an API-aware world is better.

Where there is an API function in a package, I would expect any non-API methods of it to be an exception rather than the rule. So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc (exported) and _myfunc and __myfunc for next-level internals, for example.

The argument that a keyword will help people be mindful is undermined by having an all-method construct; at that point it's just as easy to define the API methods in a central location and rely on tooling to raise awareness of which functions are API. And a central location is a more natural extension of what we do with export.

As far as the scenario where Package B adds a method to an API function (say, getindex on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...) is an API of Package B if and only if MyType is API type of Package B. And non-API types shouldn't be returned by API methods.

timholy commented 1 year ago

I have not given this anywhere near the amount of thought you have, @Seelengrab, but I am also a bit skeptical about the per-method thing. I kind of like @c42f's scoped export A, B, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:

scoped export f
f(args...) = ...

if they want that, too. (exports don't all have to go at the top of a module.)

As a way of ensuring that detached scoped exports don't lead to violations, we could add Aqua testing for your API rules.

ToucheSir commented 1 year ago

Trying to draw inspiration from how other languages do this, I'm reminded of how method overrides are handled in languages with classes. If an abstract method is marked as public, any overrides must be public as well. Less familiar with the space of languages with traits/protocols, but the couple I do know of have similar behaviour when it comes to implementing them (visibility of implementer must be >= visibility of trait/protocol methods). Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.

Seelengrab commented 1 year ago

I've segmented the responses according to who I've quoted, so it's easier to follow :)


@BioTurboNick

So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc (exported) and _myfunc and __myfunc for next-level internals, for example.

Explicitly marking things as non-API has the issue that it's easy to forget - I mentioned that somewhere, I think. You really don't want to end up in the situation of having to do a technically breaking change in a non-breaking release (by semver) just because you forgot to add the non-API marker. No marker -> private/internal should be the default, rather than the exception. Rust goes as far as not even allowing you to access things that haven't been explicitly made available (which we don't want, and also can't do before 2.0 (but more like 10.0) because it's massively breaking).

As far as the scenario where Package B adds a method to an API function (say, getindex on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...) is an API of Package B if and only if MyType is API type of Package B. And non-API types shouldn't be returned by API methods.

That takes away agency from the developer of Package B, in that it doesn't allow the developer to have a public type, but without having getindex be part of the API of that type. In a sense, requiring this kind of proliferation means that every time you extend the API-function/method foo of some other package with an API-type of your package means that you are automatically now responsible for how the package uses foo internally, and that it doesn't break. This feels a bit hard to do, when you don't have control over that. Selecting API on a per-method basis circumvents that issue, because it allows the developer of package B to say "no, I want to use this internally, but I don't really want to support this publicly".

To be fair, the example is a bit abstract, but I think the only reason something like this hasn't been a problem so far is because we really don't know what the API surface of any given thing is, so we somewhat expect things to break. It's also hard to opt into someone elses code, since we currently only have single abstract subtyping and the most succesful attempt at having such a wide interface package, Tables.jl, doesn't use abstract types for signaling dispatches at all (there are trait functions though). I think that's a missing feature in julia, but that's for another day and another proposal :)


@timholy

I kind of like @c42f's scoped export A, B, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:

I'm strongly opposed to calling this a scoped export, because it does nothing of the sort. It is NOT exporting a symbol, and I think conflating the meaning of "this is OK to access, I support this" with export is what brought us into this mess in the first place. It's a subtlety that I think is best to highlight by giving it a different name, so that people think of "This is my API" when using api (or whatever we end up calling it) and "this is what I want you to have directly available after using" when using export.

As a way of ensuring that detached scoped exports don't lead to violations, we could add Aqua testing for your API rules.

I'm not sure Aqua rules are enough for this - it's not a linting thing after all. I think something like @test_api is good to have in Base, since we do have the issue of fuzziness of API in Base too (and it's much more prevalent than in some parts of the ecosystem :grimacing: )


@ToucheSir

Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.

This is no coincidence - the rules laid out both by the Rust guidelines as well as the ones I've described in the proposal are purely a consequence of the type system and what it means to change something from version A to B. It's mostly a type theoretic argument that leads to the interpretation that it ought to be possible to mark individual methods as API. I feel though that to make this point more clear, I'll have to write some more about what I (and I think the wider programming language design community at large) considers a "type". So little time though... Taking that lense, as well as thinking about what SemVer means, leads to a very natural derivation of these rules.

On the flipside, it's of course possible to take a coarser approach and mark whole functions as API (that's also described in the proposal after all, that's api function foo end) and if that's the direction we want to go in, great! As I said elsewhere (and too often), anything to mark "this symbol is intended API, even if not exported" is better than the nothing we have now.

If dropping per-method-API gets more support for this syntax proposal, good - let's do it. We can always add it back later if we feel that we do want that granularity after all, since it's always allowed to widen your API ;)

timholy commented 1 year ago

I'm strongly opposed to calling this a scoped export

I guess there are two aspects: (1) what you name it (I'm happy to drop that aspect of the proposal), and (2) where you put it. If we drop the per-method idea, then there is no reason you couldn't put these annotations near all your exports. Personally I like that design, it makes it clearer it applies to the function rather than the method.

ToucheSir commented 1 year ago

To be clear, my point was that Rust appears to support many of the same rules about methods and API compat despite not allowing for private implementations of public methods. That's not to say allowing method-level granularity when specifying API surface area is a bad thing, but it suggests to me that these two pieces are somewhat orthogonal.

LilithHafner commented 1 year ago

I agree that consensus on what "API" means is good and should probably be documented. Thanks for thinking about this and making a concrete proposal. One of the advantages of a concrete proposal is that I can point to concrete and hopefully-easy-to-fix problems.

in the example above [MyStruct is API and AbstractFoo is not], the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver

I think this is too strict. A better requirement is that any subtyping relation between two API types should be preserved. For example, _InternalAbstractType can be removed in MyStruct <: _InternalAbstractType <: Any.

This is okay because folks who are dispatching based on _InternalAbstractType are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0, that method accepts ::MyStruct, so it must continue to accept ::MyStruct even after MyStruct <: _InternalAbstractType no longer holds.

For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API ... The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.


As for how to mark things as API or not API, I concur with some others in this thread that marking API similarly to export is a good idea.

enters namespace with using does not enter namespace with using
part of API export my_function, my_constant, MyType foo_bar my_function, my_constant, MyType
not part of API don't do this default

*foo_bar could be scoped export, api, export scoped=true etc.

adienes commented 1 year ago

@LilithHafner along those lines, I enjoyed the proposal from a few years ago here https://github.com/JuliaLang/julia/issues/39235#issuecomment-958539052

where in this case foo_bar might be a semicolon separator

Seelengrab commented 1 year ago

This is okay because folks who are dispatching based on _InternalAbstractType are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0, that method accepts ::MyStruct, so it must continue to accept ::MyStruct even after MyStruct <: _InternalAbstractType no longer holds.

That's a good point, and an even better one for the need for @test_api! What your version effectively means is that abstract types give some guarantees (existence of fields, dispatches of functions, invariants..) and subtyping an abstract type means saying "I give those same guarantees" (as well as dispatching like them, because you give those same guarantees). Viewed under that light, having a non-API supertype just means that you don't give the guarantees of that supertype! That's quite elegant - the only way it could be even more elegant is if we'd be able to subtype more than one abstract type, which would effectively mean saying "I give those guarantees too" (you can, of course, get in hot water if you claim to e.g. both be an even number, as well as not being an even number, should such abstract types representing these concepts exist).

I like the idea of only keeping the guarantees you've actually promised to provide.

This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.

That's a predicament then - I originally wanted to allow widening of the type to one that gives fewer guarantees, but that obviously doesn't work because then reads may fail in dispatch (you can't take away guarantees you've given in a previous version). Taken together, these two imply that types of API globals must not change, at least without releasing a breaking version. Not related to the syntax proposal per se, but quite unexpected nonetheless!


Since I'm not sure it's quite clear, my motivation for having api markers on a per-method basis is that it makes it abundently clear what is API; only marking the name API makes it (in my eyes) much more likely to forget to add a test for the API surface - if it's marked on a per-method basis, it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz) and you're safe from accidental API regressions. That's just not the case when doing a name-wide API declaration.

timholy commented 1 year ago

it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz) and you're safe from accidental API regressions.

I agree that having specific signatures be part of the official API makes sense, but I think that might be orthogonal from the Methods themselves. @test_api foo(::Number, ::Number) is something that can be implemented (sort of) independent of whether there is an actual method with that signature.

I say "sort of" because @test_api evidently depends on type-inference, and that presumably works only up to runtime dispatch boundaries. Unless one is then going to check all potentially-applicable methods (for the callee of that runtime dispatch), in which case is once again out of the land of specific method signatures.

So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.

Seelengrab commented 1 year ago

@test_api foo(::Number, ::Number) is something that can be implemented (sort of) independent of whether there is an actual method with that signature.

How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number) has to exist; the semantic meaning of @test_api is that no MethodError happens when calling foo. This doesn't mean that there can't be a MethodError further down the callchain of course, as a dynamic dispatch isn't generally penetrable to inference, as you rightly point out.

After all, if no method foo(::Number, ::Number) (or wider, as foo(::Any, ::Any) also works) exists, it must be a failing test case, since the signature is statically known not to dispatch (barring eval, but that throws a wrench into everything).

So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.

I'm sympathetic to that idea, but I consider it out of scope for this particular proposal. I think the idea "declare api" extends to other modifications we might want to do with the type system later on quite well; the semantic of "I support these additional guarantees" ought to be possible to tack onto almost anything after all, even after the fact.

EDIT: To be clear, I consider the formalization of type signatures in the type system distinct from methods out of scope; not tying API to signature, as that's exactly what I want to do with per-method API :)

timholy commented 1 year ago

How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number) has to exist; the semantic meaning of @test_api is that no MethodError happens when calling foo.

You can implement

foo(::Real, ::Real)   # method 1
foo(::Any, ::Any)    # method 2

and that signature is fully covered. The coverage is nominally wider than what we're guaranteeing to be API, but that's OK. In fact it's more than OK: as soon as you think about the implications of trying to actually write @test_api , this issue is going to come up at every runtime-dispatch boundary. So allowing this at the "entry point" is not really any different than allowing it at runtime dispatch boundaries.

Again, the point is that you cannot implement @test_api without relying on inference, and when inference fails at runtime dispatch then you have to consider the universe of all methods that might be applicable. Otherwise you haven't proven that you fully support the API you claim to support.

https://timholy.github.io/SnoopCompile.jl/stable/jet/ provides a concrete example, if this is still still unclear.

I am concerned that inference failures are going to mean that we'll frequently run up against bar(::Any) but we don't propose to make bar(::Any) guaranteed API. Presumably we can only apply the API guarantee to things that have no (or limited) inference failures. So loading a package seems like it could potentially break the API guarantee, but unless that package tests the universe of all API-guaranteed objects, you might not catch this in the package's tests.

LilithHafner commented 1 year ago

I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human sort! is API and ht_getindex is not, than to tell a human sort!(::AbstractVector) is API and sort!(::AbstractVector, ::Int, ::Int, ::Base.Sort.Algorithm, ::Base.Order.Ordering) is not. Further, it would be really nice for discoverability if names(Module) returned exactly all the symbols that are a part of the public API.

Nevertheless, looking for an example where more specificity is needed, let's say I have a method

foo(x, y) = x + y    # version 1

and I think I might eventually want to change it to

foo(x, y) = x - (-y) # version 2

Which has slightly different behavior for some input types (e.g. foo(1, typemin(Int16))). I want to guarantee the behavior on foo(::Int, ::Int) now and leave the rest as internal. Can I do this & how?

With per-function API this would look like

api foo(x::Int, y::Int) = _foo(x, y)
_foo(x, y) = x + y
BioTurboNick commented 1 year ago

One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.

Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about.

Seelengrab commented 1 year ago

@timholy

as soon as you think about the implications of trying to actually write @test_api , this issue is going to come up at every runtime-dispatch boundary. So allowing this at the "entry point" is not really any different than allowing it at runtime dispatch boundaries.

I'm not talking about @test_api foo(::Int, ::Int) running inference though - it's only checking which signatures are in the method table for foo and checking that against being able to dispatch (Int, Int) to it. Not whether the underlying call doesn't error at all further down the callchain, which is in line with the dynamic nature of julia.

Otherwise you haven't proven that you fully support the API you claim to support.

Well yes, @test_api is not supposed to test whether the entire implementation conforms to the contract. It's not a frontend to a full-stack static analyzer - it's only purpose is to to check whether the initial dispatch to foo still works, to prevent regressions in dispatch API. There may be MethodErrors thrown from other calls down the chain, as I said above, and that is fine.

There isn't anything about @test_api that needs to touch inference - it's only about statically knowable information, which methods of the surface call are available (ignoring eval), not about statically inferring that there are no MethodErrors in the call stack later on. I'm not trying to rebuild JET.jl here.


@LilithHafner

I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human sort! is API and ht_getindex is not, than to tell a human sort!(::AbstractVector) is API and sort!(::AbstractVector, ::Int, ::Int, ::Base.Sort.Algorithm, ::Base.Order.Ordering) is not. Further, it would be really nice for discoverability if names(Module) returned exactly all the symbols that are a part of the public API.

That is a good point, yes - I haven't thought of names being what lots of people use for API discovery.

I want to guarantee the behavior on foo(::Int, ::Int) now and leave the rest as internal. Can I do this & how?

If you want to guarantee something on a specific set of arguments, there's no way around explicitly giving that guarantee; that's irrespective of whether a single method is api or not. You have to write foo(x::Int, y::Int) either way, and give a docstring pointing out the guarantees for both implementations, otherwise people are going to be confused about which argument type combinations have which guarantees. With per-method API, that would look like

"""
     foo(::Int, ::Int)

Specialized implementation of `foo` for `Int`. Only relies on addition.
"""
foo(x::Int, y::Int) = x + y

"""
   foo(x, y)

Fallback definition of `Foo`. Doesn't assume addition, only requires negation & subtraction.

!!! warning "Mixed Integer Behavior"
    Due to julias integer types being asymmetric, this implementation may not always give the same result as same-type `foo`, e.g. `foo(1, typemin(Int16))` differs from `foo(one(Int16), typemin(Int16))`.
"""
foo(x, y) = x - (-y)

If you don't want to provide a fallback definition yet, that's just fine - there's no need to give one if you don't have any guarantees to provide for it yet.


@BioTurboNick

One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.

That's a very good point, but I think per-function API has the same issue: you'll just get the MethodError on an internal function instead of the API function the user actually tried to call. Worse, through too-permissive dispatch on the internal function, you may get a longer stacktrace that's harder to debug compared to if you got that MethodError (or an API warning) right at the beginning. I think it's been a point in the past, how MethodError is hard to debug because of the fact that they can bubble up from way down some callchain.

Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about.

I think that's only true if both API and internal functions are strict in what they expect - which is explicitly counter to typing signatures Any to allow dynamicness and specialization just do its thing. If you are stricter with types in your signatures, whether the function or method is API begins to loose it's meaning :shrug:

LilithHafner commented 1 year ago

A lot of people, myself among them, have objected to per-method API declarations. Does anyone have an example of a function in the wild for which it is best to declare a per-method API?

Here're 5 random functions, for inspiration

julia> rand(names(Base), 5)
5-element Vector{Symbol}:
 :isreadonly
 :retry
 :sin
 :isone
 :map
dalum commented 1 year ago

Does anyone have an example of a function in the wild for which it is best to declare a per-method API?

This is not an actual example in the wild, but I feel this is something that can come up when extending functions defined in other packages. For example, let's say I want to define a new number type, called MyNewNumberType, that I consider public API, on which I have defined:

function Base.sin(::MyNewNumberType)
    ...
end

However, because Base.sin is public API from Base, have I now inadvertently declared this method API by extension? In many cases, this would probably be desired, but maybe I only defined this method for internal use, to pass the object through generic functions for some intermediate calculations. Because it was only defined for internal use, I don't want to be required to maintain the implementation long term, but there isn't a way for me to opt-out of this, if the "API surface" is determined by the source code. This could perhaps be solved by requiring my package to re-declare Base.sin as API, before my method is actually considered as such, but if I define two types, one for which I want to guarantee support, and one which I don't, then I'm not sure there'd be a way to disambiguate that without per-method support.

adienes commented 1 year ago

I'd like to clarify that my issue with the per-method API was not about the granularity of the feature but rather about the mechanical expression of the syntax being scattered throughout the code, in that I think this is really something that should be centralized similar to export lists.

@BioTurboNick One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.

Isn't this already the state of affairs? I don't think having a method-granular API changes anything; it only exposes what's already true. Plenty of code is written generically enough, and with users ambitious enough, such that package code is often composed in ways that the authors likely did not expect or intend. I think feature-wise the full signature of the method must be considered, otherwise how are packages ever supposed to import/extend external functions to their API?

Seelengrab commented 1 year ago

@LilithHafner

Does anyone have an example of a function in the wild for which it is best to declare a per-method API?

I think we don't have such examples "in the wild" because it's really, REALLY difficult to extend the API of another package in a meaningful way at the moment, if you're not already a developer of that package. I feel that this is because what is considered API is currently not well defined at all, so there just aren't examples for this because noone does this in julia (because it's really, REALLY hard without breaking something). My goal is to make this sort of extension easier, by giving a place to say "this is where you can extend".


@adienes

Isn't this already the state of affairs? I don't think having a method-granular API changes anything; it only exposes what's already true.

Yes, exactly - the issue is that this API surface is not well defined, and that's all I want to tackle with this proposal. By tackling that sort of ambiguity and exposing what's already true programmatically, we can become much more effective at extending the code of other people.

I think feature-wise the full signature of the method must be considered, otherwise how are packages ever supposed to import/extend external functions to their API?

I'd like to clarify that my issue with the per-method API was not about the granularity of the feature but rather about the mechanical expression of the syntax being scattered throughout the code, in that I think this is really something that should be centralized similar to export lists.

Absolutely! While I personally prefer writing per-method API close to the definition, actually doing that is not as important as being able to declare per-method API in the first place, as well as being able to test for accidental regressions in that.

adienes commented 1 year ago

I would still be curious to hear more detailed feedback for https://github.com/JuliaLang/julia/issues/49973#issuecomment-1569093036 . is the main concern that it is too verbose / unwieldy? from my POV, it does seem to provide

Seelengrab commented 1 year ago

is the main concern that it is too verbose / unwieldy?

Not necessarily, it's just that I have another concrete proposal for that kind of thing already in the works (parts of which certainly shone through in some of my comments above). On top of this being a bit orthogonal to the whole "How do I mark something as supported" discussion this proposal is about (which, as mentioned above, should be perfectly compatible with whatever type-theoretic additions we might want to do later on), I didn't want to derail this discussion with another long-form text.

As such a change is much larger than the one discussed here, I think this proposal has better chances of being accepted (one way or another) without trying to solve everything at once. That's why I've structured it in such a way as to be easily extendeable for any new language/type-level concept we may want to do in addition to this.

adienes commented 1 year ago

I certainly did not intend to derail the thread; it might have been more appropriate to continue this thread as a discussion on your concrete proposal and use a separate one for brainstorming, my apologies.

I do want to confirm / be clear on something though: what I had in mind doesn't change or extend the type system at all, right? Possibly the word interface was a poor choice there, but I only meant that as an analogy. Ultimately the proposal is more or less "just" a more structured export list, where the exports can be scoped, tagged with experimental etc, and given the full method signature (within the confines of the existing type system)

Seelengrab commented 1 year ago

All good - my assumption was that users would be able to dispatch on these kinds of interfaces; if they can't do that, what advantage over current trait implementations/compiler guarantees/diagnostics do we hope to gain from them? After all, if we want to be able to say "This is the interface I support", it would be nice to dispatch on that fact as well, ala "This is the interface this function requires for this argument". This requires extension of the type system though, which is where things can get quite tricky (and is why I've deliberately left the grouping of functions/methods & structs into one unit out of this proposal - this is purely about "I want to give some stability guarantee for some thing" ;) ).

I do have thoughts/angles at solutions for these, but those ideas are not quite ripe for sharing yet, I don't think.

c42f commented 1 year ago

I'm strongly opposed to calling this a scoped export, because it does nothing of the sort.

This seems like a pretty strong interpretation of the word export to mean exactly what it already means in Julia in terms of symbol visibility.

But if you look up the more general definitions (https://en.wiktionary.org/wiki/export#Verb) you'll things like

  1. To carry away.
  2. To sell (goods) to a foreign country.
  3. To cause to spread in another part of the world.
  4. To send (data) from one program to another.

We're allowed to change the meaning of words by adding adverbs like scoped if that's a good analogy and users understand what it means. I think scoped export is clear enough from this point of view and it reuses an existing keyword which is nice.

Seelengrab commented 1 year ago

I think scoped export is clear enough from this point of view and it reuses an existing keyword which is nice.

This is where I disagree - reusing an existing keyword (export) with a clear meaning in the context of julia ("make this symbol available without explicitly importing it") to mean something entirely different ("This symbol is supported as API"), even if modified with an adverb, is a bad thing. I really don't think the other general meanings of the word are relevant there; all of them are about moving something from place A to place B, which "This symbol is supported as API" just doesn't do.

dalum commented 1 year ago

because it's really, REALLY difficult to extend the API of another package in a meaningful way at the moment, if you're not already a developer of that package.

I would like to hear a concrete description of why this is hard at the moment, because this has not been my experience. What has been my experience, though, is that there is a lack of consensus on how to mark something as supported/API, which I think is a very good issue to tackle. I also think that this issue for me boils down to:

Seelengrab commented 1 year ago

The hard part I'm referring to is not making things work somehow (that's usually fairly easy - just overload things until you don't get MethodError anymore and/or you get a result that looks right), but making it work without accidentally relying on internals or undocumented behavior, as well as actually following the semantics/guarantees the interface requires.

My pet example for this is the Random stdlib. While it's good that there's documentation for the API surface, I rarely see the API used correctly; that is, people don't write a method for Random.rand(rng::AbstractRNG, ::Random.SamplerType{MyType}) for creating random instances of their type (or even defining a custom sampler type..), but rather write a method like Random.rand(rng::AbstractRNG, ::Type{<:MyType}), which is not at all supported by the interface; it just happens to work because the surface-level API is the same. This extends to the ecosystem, where e.g. (without wanting to pick on anyone in particular) RandomNumbers.jl gets it wrong (others get it right, like StableRNGs). Distributions.jl gets it half-right - they do define their own sampler objects, but then don't hook into the existing API by subtyping Base.Sampler, but rather punning on the rand name to mean "sample from this distribution" in their own type hierarchy. I've also had some extensive discussion with @cscherrer about this particular thing on Slack & in one of his packages, but I can't remember which it was (sorry!).

Similar issues can occur in other codebases, where implementing an API is more complicated than simply overloading a single function.

Things like that, where rand is (in general) API, but specific implementations are not, is what I think are relevant here :shrug:

timholy commented 1 year ago

but making it work without accidentally relying on internals or undocumented behavior

I am struggling to understand the scope of this proposal. On one hand it seems to be close-to-trivial, just a way to "tag" things essentially as an adjunct to documentation. That has some merit on its own. On the other hand it seems to be an attempt to provide machine-checkable guarantees on dependencies. The latter seems quite exciting but I worry that it gets into serious technical difficulties, because I don't know how you provide such guarantees without diving into the implementation of the API using inference. And lots and lots of Julia code has barriers to inference.

Most likely, I'm just having a hard time seeing your vision. But both the OP and the discussion have focused a lot on what we name things and what kinds of objects can receive that name. I'm far more interested in the "Uses" and "Required Implementation Steps" in the OP, but those seem far less fleshed out and haven't gotten as much discussion. My concern is that all this discussion about naming is a bit premature because it's not obvious to me that technical issues won't end up dominating the actual benefits (or lack thereof) from this proposal.

If this discussion is getting somewhere, by all means continue it. But personally I am not seeing something that looks like progress. If this were my proposal, at this point I'd start writing @test_api, and once I'd made substantial inroads I'd share the draft and report back. You could presumably generate test cases using a simple "cheat": any object that isn't API could get a notapi somewhere in its name, and thus you could fake your api-check without yet having gone to the trouble of modifying Julia's internals. That will let you develop the hard parts of this proposal while deferring the painful parts of this proposal; we'll all have a clearer vision for what you can do with this new tagging system before having to invest huge effort in it.

Seelengrab commented 1 year ago

On the other hand it seems to be an attempt to provide machine-checkable guarantees on dependencies.

Let me reiterate then: I do NOT want to propose a machine checkable guarantee on dependencies. All I want is to have tools like Documenter or the REPL being able to consistently say "this thing is considered API by the package developer. Read the docstring for more information!". Nowhere have I mentioned wanting to do machine checkable proofs of callstacks with this proposal; I actively consider that out of scope - it's something that's been thrust onto this proposal, but not by me. The reason I've stated the guarantees & rules in the proposal is because that's what needs to be followed at minimum by a developer to not release a breaking change on something that they consider API. How they check that this is the case is their concern - one possibility is the addition of that @test_api macro.

If my goal was to have machine checkable proofs, I would have written that in the Uses section, as I'm well aware how much work that is. This proposal, as I keep repeating, is purely about marking things a developer says "I want to support this in SemVer". No part of this proposal is about the semantics of that support (though the rules contained in the OP are the bare minimum requirements such that a change in functionality is not a breaking change, whether with this proposal or without it).

I'm by no means suggesting that this will solve API misuses or incompatibility issues - not at all! But it WILL make it easier for a user to know what the actual API surface is, as well as give developers a consistent avenue to say "No, sorry, your use is out of scope/support".

timholy commented 1 year ago

Great, that helps! But I still don't understand how you get some of the benefits you claim, notably:

Tree-shaking for Pkg images/static compilation, to only make api bindings available in the final image/binary/shared object

Seelengrab commented 1 year ago

But I still don't understand how you get some of the benefits you claim, notably:

That is written primarily with (future) static compilation to a shared object in mind - if a function/method/global variable is marked as api, we can take that as a sign that the name/symbol should not be mangled, as it's intended to be part of the API of the codebase.

That's why I keep going on and on about the intent of a developer - by giving developers the possibility to signal their intent, in a consistent manner on the language level, we can use that intent in lots of places that touch the API surface, or are related to handling the API surface.

I do realise that this is a bit of a "head in the clouds, maybe future stuff" approach, by daring to imagine what could be done with this :shrug: I have no illusions that machine-checkable proofs of API surface mean anything when crossing an FFI surface.

timholy commented 1 year ago

Ah, this is just about naming in the shared library! OK, I thought it was about inclusion/exclusion of compiled code, which is a much harder problem.

Seelengrab commented 1 year ago

Yeah, in hindsight, I could have worded that better in the original post :joy: I'll edit it in!

timholy commented 1 year ago

Another clarification: for linting hints, what is your proposal for ambiguous cases? Is it better to have false positives (warn people that foo(x, y) might not be API if a subset of 2-arg methods of foo aren't marked api) or false negatives (assume because at least one 2-arg api-tagged method exists that this is probably OK)?

If you're not using inference, per-method annotation seems to require that you plan for one of these outcomes.

Seelengrab commented 1 year ago

Since linting julia code in terms of types always requires inference to not warn about potential MethodErrors erronously anyway, I'm inclined to have false positives, following the behavior for MethodError. If using inference is not an option, I'd err on the side of caution and rather have false positives - but this may be just my personal preference to get some form of type inference back as soon as possible, even in type unstable code.

cscherrer commented 1 year ago

I've also had some extensive discussion with @cscherrer about this particular thing on Slack & in one of his packages

That was here, though most of that was in the linked Zulip discussion