fair-software / howfairis

Command line tool to analyze a GitHub or GitLab repository's compliance with the fair-software.eu recommendations
https://pypi.org/project/howfairis/
Apache License 2.0
58 stars 23 forks source link

Support for self-managed GitLab server #69

Open sverhoeven opened 3 years ago

sverhoeven commented 3 years ago

GitLab can be hosted/managed by anyone like universities for example https://gitlab.tue.nl/ . If I try one of the repos on there I get

howfairis https://gitlab.tue.nl/bp-tue/pyrano
Checking compliance with fair-software.eu...
url: https://gitlab.tue.nl/bp-tue/pyrano
AssertionError: Repository should be on GitHub or on GitLab.

It would be nice to support GitLab servers other than gitlab.com.

jspaaks commented 3 years ago

I think the assertion error was there to ensure the tool can correctly construct the raw file urls.

Does anyone have an example of a self hosted gitlab instance that is open? tue one asks me to sign in

sverhoeven commented 3 years ago

If you go to https://gitlab.tue.nl/explore you can see open repos.

sverhoeven commented 3 years ago

~Another GitLab server example is https://foss.heptapod.net/mercurial/tortoisehg/thg which uses Mercurial instead of git.~

heptopod is a fork of GitLab with mercurial support not a straight instance.

sverhoeven commented 3 years ago

Another GitLab instance is https://gitext.gfz-potsdam.de/id2/software/services/fair/software-quality-assurance

unode commented 3 years ago

Following from the SORSE event, it wasn't quite clear if there were reasons against supporting self-hosted GitLab instances. Can someone from the team elaborate a little on this? Thanks!

knarrff commented 3 years ago

I understood that the assumption was that if someone would have a private gitlab instance, it would also not be publicly accessible, and for repositories that aren't public, the tool wouldn't be of much use. I don't think that this first assumption is true, though: I know quite a few 'privately" operated gitlab instances that can and do host public repos.

knarrff commented 3 years ago

Also from today's talk: it seems that part of the code assumes there is only one gitlab instance, i.e., the "gitlab" API keys were shown to be set via an environment variable with a generic "GITLAB" in its name. I guess what we would also need for this ticket is something that sets those for host names, so that you can configure different keys for different instances, and one of them then can then happen to be "gitlab.com". Given that you can have multiple hosts a configuration file might be a better option than environment variables, but that's another discussion.

jspaaks commented 3 years ago

Hello and welcome @knarrff and @unode

Sorry if I wasn't clear in my presentation yesterday (link to the SORSE event, permalink).

I meant to say that howfairis currently supports gitlab.com, but not its self-hosted instances. That could change in the future though. From what I've heard, in particular some German research institutes seem to prefer self-hosted gitlab instances over gitlab.com or github.com, I assume due to privacy law restrictions.

From a technological point of view, it may be more difficult, since these additional instances introduce more degrees of freedom, which our code then has to take into account somehow. For example:

  1. what is the version of the API that self-hosted instances are offering
  2. is that even the same API as what gitlab.com offers
  3. where to find any documentation about that
  4. what they set as rate limits, etc.

These additional degrees of freedom also relate to the point made in https://github.com/fair-software/howfairis/issues/69#issuecomment-794308894, and I think having a config file there would then make sense. Thanks for that suggestion!

The other question about "the tool not being of much use [for repos that aren't public]" had to do with whether to support evaluation of local files. For example, I showed how you can do

howfairis https://github.com/fair-software/howfairis-livetest

but then somebody (more or less) asked the question, what if I do something like

git clone https://github.com/fair-software/howfairis-livetest .

, can I then subsequently do something like

howfairis .

or just

howfairis

, and get the same evaluation.

The answer is, we chose not to support this, because we couldn't see much benefit in evaluating FAIRness of a clone of some repository that's sitting on somebody's local storage.

Hope this helps, and please let us know if you have further questions, we're happy to help!

-Jurriaan

unode commented 3 years ago

I meant to say that howfairis currently supports gitlab.com, but not its self-hosted instances. That could change in the future though. From what I've heard, in particular some German research institutes seem to prefer self-hosted gitlab instances over gitlab.com or github.com, I assume due to privacy law restrictions.

This is very much our case.

  • what is the version of the API that self-hosted instances are offering

Would there be a problem with: This repository doesn't meet the minimum requirements and couldn't be verified...? For the FAIR ribbon in the report would a "Unknown" or "Failed" (gray) status be outside the scope?

  • is that even the same API as what gitlab.com offers

I would assume so, unless howfairis is using somewhat obscure API endpoints.

  • where to find any documentation about that

Would GitLab Community-Edition API docs help?

  • what they set as rate limits, etc.

I assume this would be something covered by the same solution to another question asked during the same event: "What if an error happens while verifying a few hundred repositories?"

jspaaks commented 3 years ago

For future reference, here's @unode gitlab https://git.embl.de/explore (for some reason link not visible in the post above)