Sovisu+ publications harvester as microservice
SoVisu+ Harvester is distributed under the terms of the CeCILL v2.1 license (GPL compatible).
:warning: This project is still in development and is not yet ready for production.
The SoVisu+ Harvester project is intended to provide a unified interface to the various scholarly publication databases and repositories that are used by an institutional research information system in general and the SoVisu+ project in particular.
SoVisu+ Harvester is implemented as a microservice that can be deployed in a containerized environment.
It is intended to institutions that have already created and actively maintain a repository of matched identifiers for authors, structures and research projects.
The SoVisu+ Harvester is designed to receive requests containing a list of identifiers for a so-called "research entity" (wich can be an author, research structure, research institution or research project) and to return a list of references to publications that are associated with the research entity.
The list of accepted identifiers is not exhaustive as the service is extensible by design.
The references are obtained through a set of modular harvesters that are designed to query various scholarly publication databases and repositories :
Requests and results are registered in the database for monitoring purposes. The harvesting history may be consulted through the web interface.
The research entities may be submitted to the harvester in 4 ways :
The structure of the JSON output complies with the SciencePlus publications model.
Please note that references deduplication is out of scope of SoVisu+ Harvester and should be handled by another SoVisu+ component.
Server side :
Client side (admin interface) :
Install Postgresql, RabbitMQ and the web server you want to use as a front-end.
Note that poetry is not required as requirements are exported to requirements.txt.
Clone the projet, copy .env.example to .env and .test.env and update them. All the values defined in the app/settings classes (AppSettings, TestSettings, DevSettings...) can be overriden either through .env files or through environment variables (the latter takes precedence over the former).
As postgres user :
CREATE
DATABASE your_db_name;
CREATE
USER your_user_name WITH PASSWORD 'your_secret';
GRANT ALL PRIVILEGES ON DATABASE
your_db_name to your_user_name;
Repeat for test database.
Update .env and .test.env with credentials
DB_USER="your_user_name"
DB_NAME="your_db_name"
DB_PASSWORD="your_secret"
The project uses alembic for schema versioning and migration.
At development time, migrations are generated automatically from the models (see app/models and alembic/versions).
APP_ENV=DEV alembic revision --autogenerate -m "Explain what you did to the model"
At deployment time, migrations are applied to the database.
APP_ENV=DEV alembic upgrade head
The project uses poetry for dependency management.
poetry install
The project uses pytest for testing.
From project root :
APP_ENV=TEST pytest
or with coverage
APP_ENV=TEST coverage run --source=app -m pytest
coverage report --show-missing
The project uses webpack for assets compilation.
Copy app/templates/src/js/env.js.example to app/templates/src/js/env.js and update it with your values before compiling the assets.
From app/templates :
npm install
npm run build
or
npm run build
From project root :
APP_ENV=DEV uvicorn app.main:app --reload
or
APP_ENV=DEV python3 app/main.py
To update the translation files, run the following command from the project root :
pybabel extract --mapping babel.cfg --output-file=locales/admin.pot .
To init a po file for a new language :
pybabel init --domain=admin --input-file=locales/admin.pot --output-dir=locales --locale=NEW_one
To update the .po files with the new strings :
pybabel update --domain=admin --input-file=locales/admin.pot --output-dir=locales
To compile the .po files to .mo files :
pybabel compile --domain=admin --directory=locales --use-fuzzy
See Babel commad line documentation for more information.
The documentation is written in reStructuredText and compiled with Sphinx.
To export the documentation to HTML, run the following command from the docs
directory :
python -m sphinx -b html source build/html
Then, copy the content of the docs/build/html
directory to the server of your choice.
The documentation is automatically published on ReadTheDocs at each push on the dev-main
branch.
The configuration settings are defined in the docs/source/conf.py
file, which is specified in the .readthedocs.yml
file as the entry point.
To export the documentation to Confluence through the Confluence Publisher plugin,
docs/source/confluence/conf.py.example
to docs/source/confluence/conf.py
and update it with your valuesdocs
directory :python -m sphinx -b confluence -c source/confluence source build/confluence -E -a