Action to report unlinked glossary terms in documentation

matteopilz commented 1 year ago

It would be great to automatically check if a word has a record in the glossary, but has not been linked with :term:... yet.

jpfeuffer commented 1 year ago

But we do not have a glossary in the pyopenms docs, do we??

matteopilz commented 1 year ago

We do, it just doesn't work correctly for me. It links properly, but doesn't show the definition on hover. https://github.com/OpenMS/pyopenms-docs/blob/master/docs/source/glossary.rst

jpfeuffer commented 1 year ago

Cannot confirm

matteopilz commented 1 year ago

Then maybe I'm just missing a plugin.

jpfeuffer commented 1 year ago

The way you want is not possible: https://github.com/sphinx-doc/sphinx/issues/3559

matteopilz commented 1 year ago

Actually, I meant it the other way around. If we have a term in a text, that is already in the glossary, it would be nice to have a report show, that the word should be linked to that term. Otherwise, people writing and rewriting docs might not check if terms appear in the glossary and should be linked.

jpfeuffer commented 1 year ago

I'm 99% sure this is also not easily possible. You need to create an own script for this.

matteopilz commented 1 year ago

Yes, we would probably have to.

jpfeuffer commented 1 year ago

Ok sure, if someone volunteers

jpfeuffer commented 1 year ago

This is what ChatGPT gave me. Give it a try.

Here is an updated version of the extension that extracts the words from the ..glossary directive and stores them in app.config.forbidden_words:


from docutils import nodes
from sphinx.errors import SphinxError
from sphinx.domains.std import Glossary

class ForbiddenWordsError(SphinxError):
    category = 'Forbidden words error'

def collect_glossary_entries(app, env, docname, node):
    forbidden_words = [entry[0] for entry in node['entries']]
    app.config.forbidden_words = forbidden_words

def check_forbidden_words(app, doctree):
    forbidden_words = app.config.forbidden_words
    for node in doctree.traverse(nodes.Text):
        for word in forbidden_words:
            if word in node.astext():
                raise ForbiddenWordsError('Forbidden word "{}" found in the document'.format(word))

def setup(app):
    app.connect('doctree-resolved', check_forbidden_words)
    app.connect('glossary-defined', collect_glossary_entries)
    app.add_domain(Glossary)
    return {'version': '0.1'}

In this version, the collect_glossary_entries function is connected to the glossary-defined event, which is emitted when a ..glossary directive is encountered in the document. This function extracts the words from the ..glossary directive and stores them in app.config.forbidden_words.

The check_forbidden_words function is connected to the doctree-resolved event, which is emitted after the document tree has been built. This function checks the plain text parts of the document for the forbidden words stored in app.config.forbidden_words and emits a warning if any of them are found.

This way, forbidden_words is extracted from the glossary directive in the document and stored in app.config.forbidden_words which is then used in the check_forbidden_words function to check in the document.

You can use this extension by adding the following line to your conf.py file:

extensions = ['forbidden_words']

jpfeuffer commented 1 year ago

@matteopilz See the linked PR.

OpenMS / pyopenms-docs

Action to report unlinked glossary terms in documentation #329