astropy / astropy-APEs

A repository storing the Astropy Proposals for Enhancement.
Other
35 stars 36 forks source link

APE 22: Public API #85

Open nstarman opened 1 year ago

nstarman commented 1 year ago

Up for discussion!

~Very much a work in progress. Hopefully refined by discussion at the upcoming conference.~

astrofrog commented 1 year ago

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what __all__ is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

saimn commented 1 year ago

The problem with __all__ is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, https://github.com/python/peps/commit/7dba60ceea8842f00641893295d22338d99cc958). So it doesn't prevent importing from a module, and autocompletion doesn't use it. So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

saimn commented 1 year ago

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

astrofrog commented 1 year ago

Maybe in that case .core should technically be private? (._core)

nstarman commented 1 year ago

Maybe in that case .core should technically be private? (._core)

👍. That is the suggestion of PEP 8 -- and that .core should also have a blank __all__ = [].

nstarman commented 1 year ago

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that __all__ refers to what is public in that module. With this in mind, doing

# __init__.py
# No __all__ is defined
from .core import *
# core.py
__all__ = ["Foo"]

class Foo: ...

Means that Foo is public in astropy.convolution.core and private in astropy.convolution. This is contrary to how Astropy intends, where we are saying __all__in astropy.convolution.core means that it is actually private in astropy.convolution.core and public in astropy.convolution. This is confusing for many reasons. If we were to adopt PEP8 (as suggested in this draft APE) then the previous example would look like

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo
# core.py
__all__ = []  # nothing is public in this module. Please look elsewhere.

class Foo: ...

Essentially we need to move the contents of various __all__ to where the code is actually public, leaving behind empty __all__ to indicate where no code is public. Caveat private modules defining non-empty __all__ is fine. This enables *-imports in the public modules. Thanks @astrofrog for the clarification, which is now detailed in the APE.

Update

The better option is to rename core.py to _core.py and add an __all__ to __init__ like so.

# __init__.py
from . import _core, ...
from ._core import *
...

__all__ = [] + _core.__all__ + ...
# _core.py (formerly core.py)
__all__ = ["Foo"]

class Foo: ...

This still supports * imports if you want them and retains 100% unambiguity about what is public and where.

nstarman commented 1 year ago

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:

Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence.

nstarman commented 1 year ago

The problem with all is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, https://github.com/python/peps/commit/7dba60ceea8842f00641893295d22338d99cc958). So it doesn't prevent importing from a module, and autocompletion doesn't use it. So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

I agree, __all__ is not enough to prevent autocomplete, though some autocomplete does use __all__: see https://ipython.readthedocs.io/en/stable/config/options/terminal.html#configtrait-IPCompleter.limit_to__all__. I also agree we should rename modules with an underscore, as part of adhering to PEP 8. It should be noted that adding underscores does not actually "enforce" public/private API as Python does not have true language-level features for public vs internal interfaces. Like __all__, single underscores are convention and Scipy says that __all__ takes precedence over underscores. In this APE I propose that we adopt a Scipy-like rule set where __all__ takes precedence over underscores and we use both according to PEP 8.

saimn commented 1 year ago

@astrofrog - Maybe in that case .core should technically be private? (._core)

If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages' __init__.py. That's what Scipy did.

@nstarman - So this would be one of the largest changes to Astropy.

With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution. But you also seem to agree with renaming with underscores, though I think those are two different solutions.

To summarize:

I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable.

Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of __all__ of a package, I use autocompletion and the docs.

[1] They also kept module.py with deprecation warnings because there is a lot of code using import from e.g. scipy.ndimage.morphology instead of scipy.ndimage. We may want to do that in specific cases, but I don't think we would need it to do that in a systemic way.

eteq commented 1 year ago

(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @williamjamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.)

I very much agree with @astrofrog's point here:

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise.

Two more opinions to offer:

And one question:

pllim commented 1 year ago

Does this apply to coordinated packages

I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter.

nstarman commented 1 year ago

@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think!

nstarman commented 1 year ago

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@saimn I've been working on this over at astropy.cosmology. We've successfully transitioned .utils -> ._utils, io -> ._io, and I'm working on the rest. The code is clearer from a user's perspective since there's only one obvious place to import thing from and all the hidden modules aren't tab-completion discoverable.

nstarman commented 1 year ago
  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I also like Option 2 a lot. It works because _module is not made public in __init__, so even though _module defines an __all__ and makes it's contents locally / contextually public that is within a private module. It's impossible to publicly navigate to the contents of _module, only what is exported to __init__.

With this as the template, the example from https://github.com/astropy/astropy-APEs/pull/85#issuecomment-1529152807 becomes

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo
# _core.py
__all__ = ["Foo"]  # Foo is "public" in this module, but this module is private.

class Foo: ...

Another consequence is that the actual public API all s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/init.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which all to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

@eteq, I believe @astrofrog's comment largely answers this question.

nstarman commented 1 year ago

So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

@eteq, I agree.

But I was/am concerned about the inevitable state when something isn't working right.

I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have docs/api that collects our documented objects. I believe we can go further and make a two-way check to also check that something in docs/api is also in __all__. Given all this, we can make it 🤞 impossible for the docs to not reflect the public API as defined in the code.

mhvk commented 1 year ago

Thanks for writing this! I think it is good to describe what formalizing the status quo would entail:

  1. Any module from which users can import must have an __all__ (i.e., all subpackages, and some of the sub-modules of astropy.utils [which are all documented to be only semi-public]).
  2. Any module not listed in the __all__ above is implicitly private.

Implementing this is very little work and would give consistency, without breaking anything.

A few more general points:

  1. One should not needlessly break user code, even if that code "incorrectly" imports from nominally private files. I try to heed @taldcroft's advice to really avoid that. Does being "more correct" outweigh this?
  2. For developers there is also considerable value to not changing things, because one knows by heart how to get to given files -- I open them by typing (of course) and what is the benefit of me having to retrain and add underscores that do not help tab-completion?
  3. If I think of subpackages I maintain, like astropy.time and astropy.table, essentially all files would start with underscores. In astropy.units, very little except for the unit-defining modules would be public. I find astropy/units/_quantity.py, astropy/units/_function/_logarithmic.py and astropy/units/_quantity_helpers/_function_helpers.py needlessly complicated.
  4. We need to be realistic of the work involved. Conservatively, including reviews, I'd estimate 1 full month to actually do it (excluding time for this discussion, etc.). Plus an unknown amount of time of users dealing with broken scripts/packages that used to work.
  5. While scipy and numpy may be moving (partially because they had real historical baggage), others do not seem to (e.g., pandas).

Overall, I'm fairly strongly against this. But being consistent within astropy is more important, so if the consensus is to move forward, I'll do my bit for the transition.

nstarman commented 1 year ago
  1. Any module from which users can import must have an __all__ (i.e., all subpackages, and some of the sub-modules of astropy.utils [which are all documented to be only semi-public]).
  2. Any module not listed in the __all__ above is implicitly private. Implementing this is very little work and would give consistency, without breaking anything.

@mhvk, if I understand this point correctly, this is similar to point 2 of the implementation section.

2. **Add / update** ``__all__``. The ``__all__`` in each module will be updated
   to reflect phase 1. Any modules' missing ``__all__`` will have one added.

I think where this APE differs is that it aims to be explicit everywhere and not have anything be implicitly private (or public). For both users and maintainers IMO explicit is better than implicit. While the transition to this explicit state is somewhat arduous, as you noted in https://github.com/astropy/astropy/issues/15169 and in the time estimate below, once accomplished, this APE attempts to make remaining in that state vey easy: through clear rules and CI checks. (Comments appreciated to make this this APE more clear / have better CI checks.)

A few more general points:

  1. One should not needlessly break user code, even if that code "incorrectly" imports from nominally private files. I try to heed @taldcroft's advice to really avoid that. Does being "more correct" outweigh this?

This is definitely true in general. For me, the benefits outweigh the costs for 3 reasons:

  1. In the user code it's easy to switch the import from the incorrect private file to the correct public location since the public location is guaranteed to exist (it's guaranteed to exist because the user is using a public symbol, just importing it from the wrong location), e.g. units.quantity.Quantity -> units.Quantity.
  2. For select items, or for everything, we can support the nominally private files using __getattr__. See https://github.com/astropy/astropy/blob/main/astropy/cosmology/utils.py for an example. This allows for a deprecation period.
  3. Maintainers do move around private files, as is their right and prerogative. Sometimes that breaks people's workflows. That sucks, so we try to minimize the damage. What if public vs private were obvious? Then we'd never break anyone's workflow when we refactored private code (so long as they used public API). No more damage. The very point that this APE would mildly break people's code now is proof positive that we should make the change for the future.
  1. For developers there is also considerable value to not changing things, because one knows by heart how to get to given files -- I open them by typing (of course) and what is the benefit of me having to retrain and add underscores that do not help tab-completion?

True. Python's PEP-8 recommendations for how to structure a module to signal public vs private API is not wonderful for people using tab-completion in a terminal-based text editor. However, the tab-completion problem for developers is actually a feature for users, because they won't see private API. Also, while VIM, nano, emacs, etc. are great, the limitation you mention does not apply to IDEs like Sublime Text, Nova, VSCode, etc. We should aim to support many ways to develop, of course, but...

(is there a way to set up tab-completion aliases so that the old paths might point to the new ones on a local macine? A quick google found https://www.gnu.org/software/bash/manual/html_node/Programmable-Completion.html, indicating such things are possible and their might be convenient tools to set this up for devs that use terminal-based editors)

  1. If I think of subpackages I maintain, like astropy.time and astropy.table, essentially all files would start with underscores. In astropy.units, very little except for the unit-defining modules would be public. I find astropy/units/_quantity.py, astropy/units/_function/_logarithmic.py and astropy/units/_quantity_helpers/_function_helpers.py needlessly complicated.

So astropy/units/_function/_logarithmic.py would actually be astropy/units/_function/logarithmic.py (note logarithmic is not underscored). Likewise astropy/units/_quantity_helpers/_function_helpers.py -> astropy/units/_quantity_helpers/function_helpers.py.

To the broader point, top-level modules, e.g. astropy.time with flat structures would have lots of underscores.

astropy/module/
    file1.py
    file2.py
    file3.py
    file4.py
    private_submodule/
        subfile1.py

becomes

astropy/module/
    _file1.py
    _file2.py
    _file3.py
    _file4.py
    _private_submodule
        subfile1.py

There is an alternative, which is to group related components into sub-modules.

astropy/module/
    _file1.py
    _a_logical_grouping/
        file2.py
        file3.py
        file4.py
    _private_submodule
        subfile1.py

Whether this happens is of course up to the maintainers of each module. Personally I like hierarchical organization as it conveys information about a file by dint of its location.

  1. We need to be realistic of the work involved. Conservatively, including reviews, I'd estimate 1 full month to actually do it (excluding time for this discussion, etc.). Plus an unknown amount of time of users dealing with broken scripts/packages that used to work.

Sounds like a reasonable estimate.

  1. While scipy and numpy may be moving (partially because they had real historical baggage), others do not seem to (e.g., pandas).

We are more closely aligned with scipy and numpy than pandas, but your point about historical baggage stands. Arguably we have historical baggage, as you mention in point 1, since we occasionally break people's code when they are using public code but importing it from a private location.

Overall, I'm fairly strongly against this. But being consistent within astropy is more important, so if the consensus is to move forward, I'll do my bit for the transition.

Fair enough! Thanks for the open mind and valuable discussion!

astrofrog commented 1 year ago

So just to put a radically different idea out there, at least to consider as an alternative, what if we considered that public API was defined solely as objects defined in __all__ and only in a subset of the __init__.py files - we could then add something to the effect of:

warnings.warn('This module is private API and should not be relied on', PrivateAPIWarning)

at the top of most files in the code base, while in __init__.py files defining any public API we would not emit this warning and instead do e.g.:

# This file defines public API

__all__ = ['SomeClass', 'a_public_utility']

with ignore_private_api_warnings():
    from .core import SomeClass
    from .utils import a_public_utility

The advantage of this approach is that

We could likely use pre-commit to ensure that modules either include a comment such as the one above (This module defines public API) or include a PrivateAPIWarning.

The main disadvantages I can see with this approach are:

I'm not strongly attached to this idea, but I do think it merits consideration.

mhvk commented 1 year ago

@astrofrog - I certainly like the simplicity and the much smaller amount of work involved!

An even less intrusive version would of course be to have warning in the module docstring. But the advantage of your version is that the warning could include the suggestion to raise an issue if a user believes the piece of code they are importing should really be part of the public API.

If we edit docstrings, one advantage is that it would show up in sphinx, for those modules that we have typeset just because it keeps the organization more logical (e.g., https://docs.astropy.org/en/latest/units/ref_api.html#astropy-units-equivalencies-module). Obviously, though, this does not preclude adding the warning! (Or, indeed, having module names starting with an underscore)

pllim commented 1 year ago

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

astrofrog commented 1 year ago

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

We wouldn't need to filter warnings everywhere, just in the few __init__.py files that would contain public API, because then at the end of the day, whatever the user imports from the public API, any private API warning would be hidden.

If tests test non-public API then they would see warnings, but arguably we could even simply ignore the private API warnings globally for all tests in the pytest config in setup.cfg.

WilliamJamieson commented 1 year ago

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

I am with @pllim with this. I would rather see us follow the same direction as numpy and scipy by prefixing modules and subpackages where appropriate to separate out the private API (there is also a clear path here for formally deprecating things too). Adding a warning like this will make it so that we are constantly having to catch that warning internally when accessing the private API. E.G. I create a _utils module to collect some useful utilities for a given subpackage that are used throughout, but should not be public, this would mean I would have to catch the warning every time I used something from _utils because there would be no reason to ever add its functionality to an __init__.py.

astrofrog commented 1 year ago

this would mean I would have to catch the warning every time I used something from _utils because there would be no reason to ever add its functionality to an init.py.

I'm happy to also follow the numpy/scipy route but I just want to answer the above statement - it isn't correct that you would have to filter the warning everywhere internally. It doesn't matter what private methods/functions call what private methods/functions, at the end of the day all a user will do is import public API and we just need to filter the warning in the __init__.py file exposing that public API. Everything that happens internally is irrelevant. The only exception to this is if functions/methods dynamically import private modules at run-time rather than import time (which arguably we should minimize the use of).

nstarman commented 1 year ago

I do think it merits consideration.

It's good to circle back to this, as it was where this proposal started before the Astropy conference! I guess it's a sign of how complex this issue can be. Were we to start writing Astropy from scratch today, I don't believe we'd be having this debate as we'd follow PEP-8 and typing practices and protect user's tab-autocomplete, etc. The issue, is that we are not starting tabula rasa and given our current muddled situation, how do we proceed? I guess philosophically I'm always inclined to go where we would if starting tabular rasa, and to try to chart a path of minimal pain, even if there is some pain.

Here are some of the reasons the proposal evolved from the conference.

what if we considered that public API was defined solely as objects defined in __all__

That is the central gist of this APE. __all__ is the "ground truth" from which we derive our API. However, while __all__ has the immense benefits of being very explicit and instrospectable by developers, it is severely lacking for the needs of users. Users have 2 primary means to discover functionality within a library: by external channels — documentation, examples, tutorials, word of mouth — and by inspecting the code, most easily accomplished by tab autocomplete. I don't think it controversial to say that many of us tab-filter in IDEs to find a relevant function or even module. One notable issue with __all__ is that since it starts with an underscore it is not found by tab completion. And even if it were, how many would consult it when tab-autocomplete gives a full list of available objects.

So while for developers __all__ is excellent, for users we need to make their means of finding and understanding public API match __all__. How do we do that? I'm happy we agree on the documentation, that we can create CI checks that ensure __all__ matches the docs. But what about simple exploration of the library? Is astropy.units.quantity public? To a user it very reasonably appears to be. Like us, numpy and scipy have this problem in spades (numpy.core is ostensibly private, but if it's so visibly public, is it really private?).

Using __all__ is important. It's step 2 of this APE! But alone it is a half measure. Warnings is one way to try to fix this, but I think we can implement our solution at a deeper level, where users will not see private API in the first place, rather then be warned after they try to use it.

at the top of most files in the code base,

I'm unsure of this. Any use of warnings.simplefilter is liable to trip every single file in Astropy.

we wouldn't need to worry about cluttering the code base with underscores

We'd have the same number of files, just with a single underscore added. I guess I don't agree that qualifies as clutter.

  • we wouldn't need to break things for users at any point, but they would know they are using API that can break at anytime (in a sense it is a bit like a permanent deprecation phase) -we wouldn't need to have a deprecation phase where we need to have both modules with underscores and compatibility ones without underscores, which will temporarily cause a lot of clutter in the code base

To address two points simultaneously. With deprecated compatibility modules we also wouldn't need to break things for users at any point during the deprecation period. I agree that deprecation modules add clutter. It would be nice to avoid that, but at least the cleanup when the deprecation period elapse will be easy and only require deleting whole files.

this approach would arguably be more explicit and easier to learn for contributors rather than learning the rules about underscores and so on

For three reasons, I actually think the opposite.

  1. The first is that contributors rarely make new modules, so will not themselves need to deal with underscores.
  2. The second is that contributors overwhelmingly will have used terminal editors or IDEs with tab-autocomplete. On all systems I'm aware of tab-autocomplete filters underscore-prefixes (unless the user specifically starts with an underscore before pressing tab). In effect contributors will have had years of practical experience. (not that they'll need it, because of point 1).
  3. The third reason isn't a new point as you brought it up in the list of disadvantages. To reiterate then, in this APE we would be adopting a common standard, not making a new one. With a common standard everyone who has contributed to any other code base with will already know the rules. With a new standard, no one will have prior familiarity.

permanent warnings for users relying on private API.

This part I like! But from an above point, any use of warnings.simplefilter is liable to trip every single file in Astropy.

mhvk commented 1 year ago

But what about simple exploration of the library? Is astropy.units.quantity public? To a user it very reasonably appears to be. Like us, numpy and scipy have this problem in spades (numpy.core is ostensibly private, but if it's so visibly public, is it really private?).

This to me is the strongest argument for change, but it would seem to require only one part of what is in this APE: have __all__ in __init__.py. I checked by comparing astropy/units/__init__.py (which does not have it) and astropy/coordinates/representations/__init__.py (which does have it -- thanks to you!): I find that (in ipython):

import astropy.units as u, astropy.coordinates as coord
u.qu<TAB>  # Indeed, I see `quantity`, which a user does not have to see
coord.representations.s<TAB>  # I do not see `spherical`

So, we can use __all__ in __init__.py to make clear to the user what is private; this part I'm fine with (even though for units it may be a bit of a pain, given how many entries it has -- arguably making the probability really small that someone will actually discover things by chance by tab-completion and then import something that should be private and which we actually change).

But this lessens the argument for the rename: We're now left with people exploring in editors, and those we can just as easily get with documentation (module-level docstrings would be good to have regardless, and now that the module is already "private", the docstring can actually be aimed at developers, as I've hoped would happen a long time ago: #8930).

So, I remain unconvinced about the APE as a whole, a lot of effort for a problem that, in practice, just has not occurred much (for numpy, it also has not happened all that much -- though more than for us; I think the driver there is mostly that they really wanted to reorganize their structure, which has lots of historical baggage, and thus in a way they are starting from a more or less blank slate -- note this is happening for numpy 2.0, where they are allowing API breaks!).

I am much happier with just formalizing the existing policy, by documenting it and adding __all__ to every subpackage's __init__.py, and adding notes to modules. This seems to get us 99% of the way, will cost substantially less review time (the real effort in any of this) and will have no cost to finger memory of existing maintainers.

Since making __all__ in each __init__.py seems less controversial, perhaps we can just get started with that?

nstarman commented 1 year ago

I find that (in ipython):

Depends strongly on the editor. PEP-8 offers multiple means by which to define public versus private API, which is annoying and why this APE isn't just the sentence "let's do that!". In your findings, IPython command line appears to use __all__. In this screenshot, Jupyter notebook also uses underscores for modules (also valid a la PEP-8), and so finds spherical. This original proposal was to do just __all__, which is still an improvement over our current situation. But in discussion we found, as this screenshot shows, that __all__ is only a partial measure. There should be one "truth" (__all__ in both this APE and your and @astrofrog's suggestions), but fully separating public from private API requires __all__, editing the docs, and module names. I'm happy that we __all__ agree 😄 on the first two. The module names is clearly more contentious.

CleanShot 2023-08-24 at 16 48 55@2x



note this is happening for numpy 2.0, where they are allowing API breaks!).

What about Astropy 7.0? Do we need an APE to switch to a more strict version of SemVer 😆.

now that the module is already "private", the docstring can actually be aimed at developers, as I've hoped would happen a long time ago: #8930).

Truly private modules would be excellent for hosting developer docs, I agree!

I am much happier with just formalizing the existing policy, by documenting it and adding all to every subpackage's init.py, and adding notes to modules. This seems to get us 99% of the way, will cost substantially less review time (the real effort in any of this) and will have no cost to finger memory of existing maintainers. ... Since making all in each init.py seems less controversial, perhaps we can just get started with that?

I'm wholly on board with doing this (this is step 2 & 3 in this APE). It seems most folks that have looked at this APE are in agreement about the importance of adding __all__, which is good! Most of this PR, then, is uncontroversial.

I, and it seems some others, would like to do more, which is the controversial step 5 of the implementation. If that step is removed / pushed to a subsequent PR, some of this APE will need to be rewritten as we will only be addressing a subset of the problems this APE tries to resolve. If the community / CoCo thinks it's best to separate steps 1-4 from 5, I would appreciate help on the rewrite.

mhvk commented 1 year ago

Indeed, we agreed on __all__ for __init__.py, happy to just get going with that (maybe units last? The rest is far more obvious).

I'm still less sure about what we do for the modules. I'm afraid it's gotten too late here to actually look at the APE, so if this is nonsense, just ignore, but I still really dislike __all__ = [] too. Instead, as we have right now, __all__ can give immediate information for someone opening the file about what in it is actually exposed (even if at a different level for the user). It also makes constructing __all__ in __init__.py easy (or a good auto-sanity check if imports become explicit), and helps enable generation of documention for modules like units.equivalencies.

p.s. Will be on holidays for the next bit... At least, that will give others a chance to pipe in!

nstarman commented 9 months ago

@eerovaher @WilliamJamieson @taldcroft @eteq @pllim, I would appreciate some more eyes on this APE, if you have the time in this busy season. As we often follow the lead of NumPy I think this APE is very timely given their refactor to quite nearly follow this APE.

pllim commented 9 months ago

@nstarman , not sure if I have time to ponder this soon. Can this wait till the Coordination Meeting or is that too long to wait?

nstarman commented 9 months ago

@pllim, it can definitely wait to be approved. I'm not sure who would have the time to lead this effort if this APE were accepted sooner. But it would be great to have this be essentially finalized by the time of the Meeting.

pllim commented 7 months ago

@nstarman , apparently APE 22 is taken by #87 . You will have to rename your file... but at this point, I am not sure to what. Maybe @eteq can advise. See https://github.com/astropy/astropy-APEs/pull/87#issuecomment-1915413288

nstarman commented 3 months ago

Note: if we take cues from "upstream" packages, scipy now has underscore-prefixed names for modules.

pllim commented 2 months ago

This was discussed as part of "State of APEs" at Coordination Meeting 2024. I think reactions were mixed and I cannot see any clear action items on how to move this forward (or if we should).

astrofrog commented 2 months ago

One idea I raised was that at the very least if we cannot reach consensus on changing current code, we should see if we can agree on rules for any new code?

pllim commented 2 months ago

Re: https://github.com/astropy/astropy-APEs/pull/85#issuecomment-2183183433

For completeness, my response to that idea in the meeting was that if it is only recommendation for new code, I do not think we need an APE, but rather we can modify the dev docs.

mhvk commented 2 months ago

There was indeed no consensus on the underscore prefixes, in large part because, contrary to what I thought at least, things like from astropy.units.quantity import Quantity were widespread in other github repositories. Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

There also seemed to be consensus that in the end the documentation should be the ultimate arbiter, since that is what users would normally see (and the mistake of documenting the quantity submodule is probably at least partially to blame for the wrong usages...). So, it remains a good idea to ensure we document what is public and private, but start incrementally, as suggested in https://github.com/astropy/astropy-APEs/pull/85#pullrequestreview-2012234665:

  1. We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level __init__.py files (making appropriate exceptions in io and utils).
    1. We add __all__ to all subpackage __init__.py files that include the public items, including public submodules of the subpackages.
    2. We slowly add __all__ to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.

Finally, I'd say there was no consensus either on new vs old code, or subpackages doing different things, with the latter having the advantage of maintainers being able to set a policy they feel is best, but the disadvantage that then there is no package-wide logic anymore at all, while currently there is (with cosmology the only exception).

nstarman commented 2 months ago

Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

mhvk commented 2 months ago

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

It is private, indeed. But the feeling at the coordination meeting was that breaking people's code for code style purity is too big a price to pay. And it is not likely we would ever define Quantity in another place than astropy.units.quantity. I also think the issue may be moot sooner or later, since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!).

But for astropy as a whole, continuity and consistency are important too. But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__ and ensuring that, unlike for units, "private" modules do not appear in the documentation unless strictly necessary, and then with a clear docstring that states why they are included.

nstarman commented 2 months ago

since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!)

🎉. That would be excellent.

But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__

A great thing to do, no matter the outcome of this APE.


When first proposed, one of the counter-arguments was that "upstream" libraries haven't done this. But now both numpy and scipy have basically done this (slightly different implementations). And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops. I'm just wondering why we're different. Same problem, similar solution?

When I first wrote this APE it was to make an argument "why we should do this". Now that our upstreams have done the same thing, IMO the argument shifts to "why aren't we doing this?". Prima facie we should.

Documentation is important, but it is most certainly not how any of our upstream libraries define their public API. The point of public-facing documentation is to document what is public, not to make it public. Just like how we generate documentation from docstrings (prioritizing that the code contains its on documentation) so too does Python, our upstream libraries, and most everyone else makes it so that public/private is a product of the code, not imposed on it. And this understanding is intrinsic to how we've built tooling for Astropy, like sphinx-automodapi: it looks at __all__.

neutrinoceros commented 2 months ago

And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops. I'm just wondering why we're different. Same problem, similar solution?

I see 3 options here a) moving private code to private modules over one swoop b) moving private code to private modules piecewise c) do nothing[^1]

I agree with @nstarman that a>b. I also agree with Marten c>b. The remaining question is how to compare a VS c.

Since, as you guys pointed out, we may eventualy have to move part the private code from astropy.units in response to Quantity 2.0 becoming a dependency, why not use that event as a pivot to switch our strategy to a, and keep status quo (c) in the mean time ?

[^1]: I'm only speaking about moving modules/members around here. Defining __all__ is a separate discussion and one that seems more consensual anyway.

mhvk commented 2 months ago

Numpy was in a different state, with, e.g., some parts of np.lib being public, while other parts were not, so there was more urgency than we have. Even so, they held off to numpy 2.0, where a lot of other stuff was broken too.

Overall, @neutrinoceros made the right list, and I guess the conclusion in the coordination meeting was that c>a at the present time. At a time when there is a larger API change (as Quantity 2.0 would be), the conclusion may well be different.

In the meantime, there's nothing stopping us from incrementally ensuring that docstrings and __all__ are all consistent and clear.

pllim commented 2 months ago

Also keep in mind that NumPy has the backing of private industry (e.g., NVidia). Astronomy does not. I have started seeing pipelines pinning numpy<2 privately just because they have larger fish to fry and no time to deal with breaking API here and there. To them, calibration accuracy and stability is way more important than whether astropy.units.quantity is private or not. We have to keep our main "customers" in mind and they are not "big money".