This tool creates a corpus for accessible repositories in a GitLab instance. The corpus will primarily contain information about software projects.
Relevant information could be:
The output corpus is in the JSON-format, as it is widely used and because of its compatibility with neo4j.
We assume that you installed Python >= 3.8 and a recent Git client.
Please follow these steps to install the required dependencies and to make available the corpus
command line tool:
git clone <URL of this Git repository> corpus
cd corpus
pip install --editable .
NOTE
install_requires
section of the setup.cfg file.NOTE To use this tool, you first need to write a config-file
in which you provide information about the GitLab instance you want to run this tool on.
Here is an example:
[global]
default = gitlab-1
ssl_verify = true
timeout = 15
[gitlab-1]
url = https://gitlab.example.com
private_token = 123abc
api_version = 4
The tool can be run using the command corpus
.
Running the command using the --help
parameter or without any parameter, will print the help page.
The documentation is available in the docs directory.