Community discussion: ROBOT and the ODK

There are some questions now and then on where ODK starts and the ROBOT ends; to roll out both widely, it would be good to understand the core job each one does and agree.

I am a heavy user of both, so I have some opinions; my main concern is that while the ODK has now widely embraced ROBOT, I think there are still a lot of ad-hoc (non-standard) Makefiles being build with fairly complex ROBOT (or, owltools) pipelines (like OBI, GO, DO, and many more). I am not saying: force everyone in line! (<---!!!!). But I do see the risk of certain aspects of the configuration drifting apart, like release artefact definition (what is a base, what is a simple release), QC coverage (which qc should be run, and on which artefacts, and during which stage of the CI), import management etc. So before going into detail, my general suggestion is this:

The ODK defines the GitHub repository layout (filenames, directories) and defines the release and qc workflows (when is which QC test run on which file during which part of the CI/build process), while ROBOT implements all ontology transformation functions o-->[]-->o' and the QC methods themselves (not law, just rule of thumb).
The ODK bundles all tools that are needed for managing OBO ontologies (ROBOT, dosdp-tools, owltools, python pre-processing scripts) while ROBOT remains focused on its core use cases (transformation, quality control and reports). In particular, this means:
- ODK does not implement any transformation and QC methods (this means we need to retire all of these SPARQL tests we currently have and replace them by ROBOT profile etc).
- Apart from some very terrible exceptions (sed for dropping some ugly OBO format stuff, like owlaxioms section), ODK does no ontology transformation stuff (apart from calling ROBOT).
- ROBOT does not implement complex workflows such as 'robot release' (arguments later).

In my experience there are two areas of contention:

People don't see the point of using ODK when they might as well build their own tailored release and QC pipelines using a simple Makefile and a lightweight ROBOT jar rather than a much larger docker image. We need to give a good reason that everyone can accept.
Who defines the OBO foundry's recommended release artefacts.

Why should you use ODK rather than custom Makefile + ROBOT

The Makefile mainly fulfils the job of defining release files pipelines, report files and QC standards. Most of it should be fairly standard;

Standard filenames recognisable everywhere (where is the edit file? what is the simple release? where can I find the current release?)
Complex workflows such as dynamic imports resolution (mirror->module) are simply standard -> no more workarounds (merging in foreign axioms is my favourite, or letting imports go stale).
There is a chance to build an external service that manages the whole CI of all OBO ontologies fully automatically (cloning repo, running monthly releases, serving the release artefacts). While this is a bit scify right now, it is something to consider (at the very least it buts a burden on the "i want my custom pipeline" argument).
Updates to the repo structures can be rolled out fairly easily using ODKs update repo method (this makes adding imports much easier, but also, and importantly, allows rolling out new QC more rapidly - think about the introduction of a new parameter to ROBOT report that we want everyone to use, but cant make default on ROBOT because of the "dont change behaviour" policy of the development process. We recommend a new standard release artefact (think about the massive effort it costs us know to roll out json and base releases across the community - were everyone using ODK, it would be super straight forward).
Complex pipelines requiring multiple tools (ROBOT, owltools, dosdp-tools, python etc) are a major pain to set up: now its just docker pull.
Training videos can be standardised better.
the counter argument the need for customisation and disagreement with some of ODKs assumptions does not automatically mean a standard pipeline (ODK) is the wrong way -- ODK is designed to allow customisation on the Makefile level.

In my experience, there is only one good argument against the ODK.

Migrating an existing repository to ODK takes a lot of effort. Especially when custom code is used, like additional QC checking scripts and so on, migration can take anything between 2 and 30* hours. No one has that kind of time.

(30 is the worst I ever encountered with flybase ontologies, because they already had a pretty advanced pipeline in place).

But, since setting up a new repo is trivial, and form many ontologies the amount of custom code is relatively low, I dont see much else. The second action item of this ticket is:

[ ] Collect arguments against using ODK.

Where should the OBO release artefact definitions live?

Option 1: ODK.

The ODK Makefile contains all release artefacts definitions as ROBOT chains. There are IMHO three main advantages:

Transparency. In the old owltools, there was just no telling what a transformation does to an ontology; my favourite aspect of ROBOT is that chaining serves as a kind of documentation as well!
Debugging. Very often, it is unclear why an ontology looks like it does after a transformation is applied. It is part of my day2day to selectively comment out intermediate transformations of a ROBOT chain to pinpoint what exactly causes my ontology to look so weird. For example, I find that this is caused by some combination of remove+equivalents, which then leads me to apply change my ontology to get the expected result. Sometimes, I can also identify ROBOT bugs that way - this means I can build a workaround and dont have to wait for the bug to be fixed first. But I cede that this is probably a reason that only applies to handful of power users.
Maintainance. While the OBO artefacts definitions still evolve, it is much easier to change the chain in ODK and test it then to change the code in ROBOT and test it.

Option 2: Robot.

The user can run a simple ROBOT release --simple (or similar) to generate a simple release of any ontology (independent of ODK).

Immutability. In ODK, everything can be overwritten by custom code. Once a release artefact is in ROBOT, it is a bit less likely that someone simply "switches of the reasoner to create the simple release". 2. Simplicity. The standard OBO recommendations are in ROBOT, the command is super trivial. The user says thanks.

I am not 100% sure on either side; I just tend to Option 1 at the moment because I love the transparency argument and I am a heavy user of the "Debugging" argument, but I am not sure how general that is here.

Action item:

[ ] Decide where, in the long run, release artefacts should live. Collect arguments for and against.

I guess I have a lot of thoughts about this. The executive summary is that I don't need most of what ODK offers for my projects, so I don't want to pay the complexity cost. I'd prefer to standardize on outputs, not implementation.

OBI Case Study

I think OBI is a good example to discuss. OBI uses GNU Make, with ROBOT doing the heavy lifting. The OBI Makefile is about 250 lines with lots of comments and space. It's custom but I don't consider it complex.

OBI releases have always been "full" (merged and reasoned with HermiT). We might want to tweak the full release a little to line up with the emerging standard. I'd like to add a "base" release. According to the current release artifact doc, it looks like that would add about four more lines in the Makefile. I'm not interested in the other four optional release artifacts.

A big chunk of the OBI Makefile is QC, running various SPARQL queries. It looks like they could use some cleanup, but I'm not sure that ODK covers all of OBI's QC. I'm happy enough with the way OBI handles imports, templates, and modules, which seems simpler to me than the ODK way.

The ODK Makefile template is 652 lines. There's a lot more supporting files in the ODK template directory than OBI has. ODK has a bunch of stuff that OBI doesn't need.

I see how ODK helps Nico manage a whole bunch of ontology projects that have a shared history of tools. OBI doesn't share that history. Looking at what ODK has today, I don't see any benefit for OBI switching, but lots of costs. That calculation may change in the future.

Docker

Docker is its own thing. Every time I've tried to use Docker in a project, I've regretted it. I'll take a stab at articulating why, knowing full well that I won't convince anyone. Containers are a fine option for lightweight virtualization, but on macOS and Windows we run Docker inside a VM anyway, providing little benefit (please correct me if I'm wrong). I prefer to just use a VM without Docker. The primary benefit people want from Docker is dependency management, but Ansible scripts are much more flexible than a Dockerfile, NixOS is even better, and a humble JAR file is perfect if you can stick to the JVM.

And as far as "ease of use" goes, at OBO tutorials we've had much more success installing ROBOT than installing ODK.

Standard Outputs

I'm fine with ODK setting the standards for directory layout and recipes like the release artifacts. I can see some benefits to building the release artifacts into ROBOT, but Nico's reasons are good ones while we're figuring out all the details, so I'm not in a rush.

Rather than standardizing on implementation, I'd prefer to standardize on outputs. Let's build testing tools to make sure that OBI's "full" and "base" release artifacts match perfectly with ODK release artifacts. Let's look at harmonizing OBI and ODK's QC queries.

Conway's Law

I'll admit that at least part of my disagreement with @matentzn on this topic is probably based on Conway's Law:

organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.

The differences between ODK and ROBOT have so far reflected differences between @cmungall's projects and my projects. We're friends and we have shared goals for OBO and even some shared funding, but we still have distinct communication structures.

A lot of things to discuss here, but it seems we are broadly aligned on end-goals. Let's standardize project.yaml, standardize the expectations of different module types (https://github.com/information-artifact-ontology/ontology-metadata/pull/36). In theory we can have any number of tools that implement this. I appreciate this creates ambiguity in the short term, but is there anything riding on this? Is someone working on robot release right now? If not then the current situation of a subset of ontologies using odk and a subset using hand-crafted makefiles can carry on.

Can't resist an observation - 250 line Makefile is nice, but if you have 20 ontologies like OBI that's 5k lines taking up valuable headspace. Unfortunately for those of us working with model organisms with different funding streams we will have multiple similar ontologies to worry about for the foreseeable future.

I think it would help to define the matrix of system objective x target user type and their level of implementation skill. How much knowhow is required to implement either ODK or ROBOT approach in the build process? I like that both systems could work to harmonize on build process outputs.

The first user type x skill I want a solution for is fledgeling ontology builder who may have some agency IT support (who know linux servers but nothing about ontology). (Also the user may be representing a group of curators producing one ontology.) These are the kinds of people I am engaged in training; one can see people have to get past this point to engage in more complex functionality. They'd rather understand the Makefile themselves, rather than have to work with an IT person for whom the ontology part is unknown, but frankly they've never used a Makefile before and can't see the lines that trigger on changes detected in config files for example.

This entry level ontology builder needs:

A folder and file blueprint (thanks ODK!)
A Github repo that supports versioned ontology that they can push to (= Github training).
A makefile process that can insert version dates into an ontology, and publish to root release folder. That is enough for it to properly feed ontology search engines.
A makefile process that enables user to implement MIEROT principle, via robot or ontoFox calls. They can add a term ID to an import config file, and the next command line "make" rebuilds that. (= OntoFox or ROBOT training in this regard, including training on pulling terms from large ontologies).
By virtue of Github integration, other team members can perform the same actions on a single ontology.
A Hermit or other reasoner test & report.

And importantly - nothing more! Provide a different config file, or one whose sections are obviously ignorable, so that users don't have to ponder what they don't need to turn off or know, which is so much more time consuming. I.e. even if all the code of a complex system is provided, a simple tier 1 path of knowledge. The entry point to both ODK and ROBOT right now don't make clear the minimal information/ settings for the entry level ontology builder.

Then define tier 2 / 3 capabilities & requisite config files.

My 2 cents or pence!

I have a new use case for a shared project.yml. Should we discuss it here, or a new ODK issue, or maybe an issue on another repo?

I would love to hear it.. I dont mind where the issue is discussed. IMO it belongs on the repo where the ontology md files that drive the website live, but i dont mind.

We previously discussed it in a chain of mails with this subject: Contemplating centralised management of ODK configs, but if you also see now a use case: Lets go for it! I will support you strongly!

INCATools / ontology-development-kit