bskinn / sphobjinv

Toolkit for manipulation and inspection of Sphinx objects.inv files
https://sphobjinv.readthedocs.io
MIT License
78 stars 9 forks source link

Suggested cross-reference syntax for attributes should be `:py:attr:` #234

Open eirrgang opened 2 years ago

eirrgang commented 2 years ago

Brief description

Suggested syntax for Python attributes is :py:attribute:, but should be :py:attr:

To reproduce

Example: https://github.com/bskinn/sphobjinv/blob/main/doc/source/levenshtein.rst#L192

Reference

https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#role-py-attr

eirrgang commented 2 years ago

I realize now that "module", "function", and others use the directive string rather than the cross-reference role string, so I guess I'm observing a missing feature rather than an isolated bug.

By far, the easiest solution I can think of is a simple look-up table when composing the suggestion, but this would have to be updated as new index types are added to Sphinx. Still, it should be an easy enough patch, if you would like me to submit one.

eirrgang commented 2 years ago

Actually, a static table would be unwieldy, but a table could be derived more easily than I thought by inspecting the sphinx.domains.Domain subclasses. At the least, the submodules could be imported, inspected for Domain subclasses, and the object_types class attributes can be processed for a reverse mapping of ObjType.roles to directive names.

This implies there might already be an API for this.

Please let me know if this is a contribution you would be interested in, and whether you would object to a dependency on sphinx and/or sphinx.ext.intersphinx

bskinn commented 2 years ago

Yep, the objects.inv stores the directive form, rather than the role form. Not awesome, but at least it's consistent/uniform.

Please let me know if this is a contribution you would be interested in ...

Definitely, as long as we can figure out a solid way to do it.

... and whether you would object to a dependency on sphinx and/or sphinx.ext.intersphinx

Unfortunately, this is a non-starter, at least as a core dependency -- Sphinx is way too big, and AFAIK you can't get sphinx.ext.intersphinx without installing all of Sphinx. I might consider it as an extra, which users could opt into in situations where they really want/need correct cross-references out of suggest and don't mind the heavy sphinx dependency.

I don't know if this would really provide a lot of benefit, though, because I think you'd have to know which version of Sphinx was used to create a given objects.inv, not just the version of the project, and AFAIK that Sphinx version is not stored in the objects.inv anywhere.

By far, the easiest solution I can think of is a simple look-up table when composing the suggestion, but this would have to be updated as new index types are added to Sphinx.

Maybe, maybe not. Sphinx narrows down the roles it puts into the objects.inv pretty dramatically. For example, here is a set of the unique domain:role combinations in an objects.inv for Python 3.6 (the one hanging out in tests/resource/objects_python.inv):

>>> import sphobjinv as soi
>>> inv = soi.Inventory(r"tests\resource\objects_python.inv")
>>> {f"{o.domain}:{o.role}" for o in inv.objects}
{'c:function',
 'c:macro',
 'c:member',
 'c:type',
 'c:var',
 'py:attribute',
 'py:class',
 'py:classmethod',
 'py:data',
 'py:exception',
 'py:function',
 'py:method',
 'py:module',
 'py:staticmethod',
 'std:2to3fixer',
 'std:cmdoption',
 'std:doc',
 'std:envvar',
 'std:label',
 'std:opcode',
 'std:pdbcommand',
 'std:term',
 'std:token'}

I would think that, at least for the c and py domains, these roles/directives are not likely to change, basically ever. So, a static lookup table would probably be robust enough (at least until proven otherwise by a bug report 😅).

I could picture something more complex possibly working to get the Domain subclass information into sphobjinv without requiring a direct sphinx dependency, such as a separate package that installs sphinx and inspects the Domain subclasses, like you suggest, and then publishes to PyPI as a package providing an up to date substitution function. But, this still has the problem of not knowing which version of Sphinx created a given objects.inv.

Ultimately, it seems like a lot of work to fix a relatively minor inconvenience, which is mostly addressed with the caveat expressed to the user of 'these are the directive forms, not the role forms'.

One thing that would be easy to do is update the language on the suggest CLI to include a notice about that directive/role difference.

eirrgang commented 2 years ago

I don't think you need to know which version of sphinx produced the objects.inv file. If a translation is unavailable, the translation could just be skipped and/or an error/warning could be issued. This would be appropriate, anyway, because there is no way to back-trace domain:role for arbitrary sphinx extensions that may have been used.

There is also an issue in that both roles and directives can be aliased or otherwise not map one-to-one. Both staticmethod and classmethod directives create an entry that is referenced with :py:meth:, and :py:obj: can refer to almost anything.

import importlib.resources
from pathlib import Path

from sphinx.domains import Domain

for module_name in [str(resource).split('.')[0] for resource in importlib.resources.contents('sphinx.domains') if str(resource).endswith('.py')]:
    module = importlib.import_module(f'.{module_name}', package='sphinx.domains')
    for item in [getattr(module, attr) for attr in dir(module)]:
        if isinstance(item, type) and issubclass(item, Domain):
            for directive, objType in item.object_types.items():
                domain = item.name
                for role in objType.roles:
                    print(f'{domain}:{directive} -> {domain}:{role}')

However, remarkably, the first named role seems to be fairly consistently the best choice. In the few cases where it isn't, the best choice is usually has a role that is a substring of the directive.

import importlib.resources
from pathlib import Path

from sphinx.domains import Domain

def map_directives_to_roles():
    for module_name in [str(resource).split('.')[0] for resource in importlib.resources.contents('sphinx.domains') if str(resource).endswith('.py')]:
        module = importlib.import_module(f'.{module_name}', package='sphinx.domains')
        for item in [getattr(module, attr) for attr in dir(module)]:
            if isinstance(item, type) and issubclass(item, Domain):
                for directive, objType in item.object_types.items():
                    domain = item.name
                    shortnames = [role for role in objType.roles if directive.startswith(role)]
                    if len(shortnames) == 1:
                        yield (f'{domain}:{directive}', f':{domain}:{shortnames[0]}:')
                    else:
                        yield (f'{domain}:{directive}', f':{domain}:{objType.roles[0]}:')

mapping = dict(map_directives_to_roles())

produces the following translation table.

{'std:term': ':std:term:',
 'std:token': ':std:token:',
 'std:label': ':std:ref:',
 'std:envvar': ':std:envvar:',
 'std:cmdoption': ':std:option:',
 'std:doc': ':std:doc:',
 'cpp:class': ':cpp:class:',
 'cpp:union': ':cpp:union:',
 'cpp:function': ':cpp:func:',
 'cpp:member': ':cpp:member:',
 'cpp:type': ':cpp:type:',
 'cpp:concept': ':cpp:concept:',
 'cpp:enum': ':cpp:enum:',
 'cpp:enumerator': ':cpp:enumerator:',
 'cpp:functionParam': ':cpp:identifier:',
 'cpp:templateParam': ':cpp:identifier:',
 'c:member': ':c:member:',
 'c:var': ':c:var:',
 'c:function': ':c:func:',
 'c:macro': ':c:macro:',
 'c:struct': ':c:struct:',
 'c:union': ':c:union:',
 'c:enum': ':c:enum:',
 'c:enumerator': ':c:enumerator:',
 'c:type': ':c:type:',
 'c:functionParam': ':c:identifier:',
 'py:function': ':py:func:',
 'py:data': ':py:data:',
 'py:class': ':py:class:',
 'py:exception': ':py:exc:',
 'py:method': ':py:meth:',
 'py:classmethod': ':py:meth:',
 'py:staticmethod': ':py:meth:',
 'py:attribute': ':py:attr:',
 'py:property': ':py:attr:',
 'py:module': ':py:mod:',
 'js:function': ':js:func:',
 'js:method': ':js:meth:',
 'js:class': ':js:class:',
 'js:data': ':js:data:',
 'js:attribute': ':js:attr:',
 'js:module': ':js:mod:',
 'rst:directive': ':rst:dir:',
 'rst:directive:option': ':rst:dir:',
 'rst:role': ':rst:role:'}

This table could be embedded, used for translation if the objects.inv section appears in the keys (otherwise leaving untranslated), and could be periodically regenerated, as necessary.

bskinn commented 2 years ago

Hmmm, nod... and, the tox test matrix is already set up to test across a range of Sphinx versions, so a test could be added that cross-checks this mapping against what is present in the Domain definitions of the currently installed Sphinx. That should give at least some measure of early warning, if the Sphinx-internal domains should be changed in a breaking fashion.

I like the idea of a result = mapping_table.get(key, key) idiom, defaulting to the unmodified domain:role from the objects.inv if it's not found in the mapping table.

So, yeah -- if you're interested in assembling a PR based on this mapping table approach, I'd say go for it, I would be inclined to merge. It would be good to sand off the rough edge of the suggest feature providing reST that can't be dropped directly into doc source.

bskinn commented 2 years ago

I had a few thoughts this morning on some preferences for this implementation:

  1. I'd like to release this on a minor version bump, which means no breaking changes to the API (CLI output changes are not under SemVar for the project, so no problem there). So, the current implementation of Inventory.suggest() should remain (substantially, if not entirely) unchanged, and the directive -> role mapping should only be applied in the CLI.
  2. But, it would be good for API users to be able to easily use the mapping, so it should be housed in some natural place in the API hierarchy. Perhaps a new src/sphobjinv/roles.py; and, with the mapping imported into src/sphobjinv/__init__.py. Discussion on this, please, if you have other thoughts.
  3. I think it makes the most sense for the CLI behavior to default to using this new mapping; however, I think it also makes sense to allow users to disable the transform if they want. So, there should also be a new flag added to the CLI argument parser (e.g., --no-directive-transform) that yields the old behavior.