OCR-D / zenhub

Repo for developing zenhub integration
Apache License 2.0
0 stars 0 forks source link

Projects Tab: provide dependency information (backend) #102

Closed krvoigt closed 2 years ago

krvoigt commented 2 years ago

Current situation We already have an idea about how (and which) dependencies should be displayed in the kwalitee dashboard (see mock ups). We need to clarify the concept and understand where we can get the information we need to provide it from back end side.

List of information about dependencies that should be provided:

Use-Case: Example of a common error that prevents a release (for better understanding): Different projects have conflicting dependencies, this only becomes apparent at runtime. With a visualization of differing versions in the requirements.txt as well as the actual installed version in the ocrd_all venv, such problems can be identified and fixed. Example: Project A needs tensorflow 1.0.*, project B needs tensorflow 2, tensorflow 2 is installed, project A fails at runtime - I would like to identify this error beforehand.

How it should be we need a concept about the dependency part of the projects tab and a back end that provides the information we want to display.

A possible data structure for this could be:

"dependencies": [
    { "dependency name": "version"},
    { "dependency name": "version"}
],
"dependency-conflicts": [
   {"dependency-name": "project this conflicts with"}
]

In case we're only interested in the conflicts, the full dependency information can be tossed.

Steps

Prior Art What has already been done in this regard.

mweidling commented 2 years ago

Concept

Pt. 1: Preparation

In order to retrieve the information which dependencies conflict with each other, some preparatory steps come in handy. For this we can create an auxiliary file, deps.json that holds information about the dependencies of each ocrd_all submodule. This file follows a simple structure:

[
  { "submodule_name":
    [
      { "dependency1_name": "version"},
      { "dependency2_name": "version"}
    ]
  }
]

To obtain this information, we iterate once over all submodules, create a venv and retrieve the dependencies installed afterwards via pip freeze -l. Dependencies that occur in core are omitted in this case, because core is the basis for all Python based projects and , hopefully, commonly shared. I expect no dependency conflicts here.

The remaining dependencies are then stored in the JSON structure above and can be used for the actual conflict detection.

Open questions

Answers (2022-07-06)

Pt. 2: Detecting the conflicts

This is the tricky part of the whole process. Detecting the conflicts could be handled by the Repo class where the information is provided as an attribute (see above) and stored along with the other info in the repos.json.

Since we have no base project to compare each project against, each project has to be compared with all the other projects. As dependency conflicts are mutual, we can save some time here by implementing this comparison smartly. Furthermore, this step should only be performed if dependencies have changed (see below) to save time and resources.

Most software projects use SemVer nowadays, so we compare if a project has the same package listed in its dependencies but in a different major version. If e.g. project A uses a package B in version 1.0.0 and project C uses package B in version 2.0.0, there will probably a conflict as the change in the major version hints at a breaking change.

However, this approach might be error prone because it depends on third-party software following SemVer correctly.

Open questions

Answers (2022-07-06)

Pt. 3: Check for conflicts regularly

The deps.json file has to be updated only if dependencies in one of the submodules change. This can be detected during the update of the submodules which takes place during the course of an ocrd_all update: Whenever a requirements.txt or setup.py has been altered, the dependency check should be performed again. Only the parts of the deps.json that give information about the changed project have to be updated. After having performed this step, the repos.json has to be updated as well for the affected project.

Open questions

Answers (2022-07-06)

mweidling commented 2 years ago

I already started implementing along with my thinking about the concept here.

mweidling commented 2 years ago

@krvoigt As soon as https://github.com/OCR-D/kwalitee-dashboard-back-end/issues/14 is merged, this is done from my point of view. The only thing left to implement is the updating mechanism.