NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
204 stars 41 forks source link

Plugin architecture #12

Open osma opened 7 years ago

osma commented 7 years ago

We could support plugins for pre- and/or post-processing the document analysis functionality.

A plugin could be a subclass of a class like this:

class AnnifPlugin:
    """A plugin that tweaks Annif queries before and/or after they are executed"""

    def process_analyze_query(query):
        """Preprocess an analyze query, tweaking the parameters before the query is executed"""
        # default implementation is a no-op
        return query

    def postprocess_analyze_query(query, result):
        """Postprocess an analyze query and result, tweaking the result before responding to the client"""
        # default implementation is a no-op
        return query

For registering plugins, we could perhaps make use of PluginBase. Each plugin would be a separate Python project that registers itself to the Annif plugin system. Each Annif project could define a set of plugins to use. The plugins could be stacked/chained, so that the result of one plugin would be fed to the next one in the chain.

Plugins would be fed the raw result of Annif queries (with lots of candidate subject), before cutting down them into the requested size and/or applying score thresholds. This way the plugins have more candidate subjects to work with.

Ideas for plugins:

osma commented 5 years ago

In the current architecture these could simply be separate backends. Backends can register themselves using annif.backend.register_backend so I'm not sure whether the plugin infrastructure is really needed, but of course it would make Annif more extensible.

kinow commented 5 years ago

but of course it would make Annif more extensible.

I have two systems where plugins are needed. One of them has a few mechanisms for plugins. This project is passing through an update, starting by getting a setup.py file, and making more use of vanilla features.

We are thinking about using simple entry points (other good doc from setuptools).

I believe PyTest and PyLint use this approach. Scrapy is a different beast, so they used a different approach to allow scrappers to easily extend scrapy, without the need to package a python utility with setuptools.