The-Sequence-Ontology / MSO

Molecular Sequence Ontology
9 stars 5 forks source link

Set up a standard release process and coordinate with SO #10

Open cmungall opened 5 years ago

cmungall commented 5 years ago

The first part of this is using ODK to standardize the workflow, layouts and release process for MSO, see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/462 - MSO has some custom pieces but these should be easily accommodated.

I suggest:

The other part that needs to be sorted out is synchrony with SO

msinclair2 commented 5 years ago

Synchrony with SO is built into the custom .jar, which makes both SO and MSO files. What we are really dealing with here is a transition where the SO in its current incarnation needs to be maintained for some period of time (I don't know how long) for those that depend on it.

Thus we are stuck with the same problem the .jar program was meant to solve: maintaining the two ontologies manually and independently. When a new entity is added to MSO and thus (new) SO automatically, it needs to be added manually to the old SO. Should we continue maintaining old SO in this way? How do we encourage people to switch to the new SO? At some point maintaining old SO will become too cumbersome.

cmungall commented 5 years ago

OK, wow, so I had missed this point. So what you are saying is that there will be one source files and SO and MSO will be compiled from it? That is a big change, and will definitely impact things like SOPs for PRs from the community. I assume at this point you would just want to have one repo?

msinclair2 commented 5 years ago

Yes, that was the goal all along. One file to compile both ontologies from, eliminating the need to maintain both separately. That makes SO largely the same as MSO, except for those entities that can only be GDCs (e.g. contigs), and minus those entities that can only belong to molecules (e.g. SDCs such as enzymatic function).

Though we haven't discussed it, practically at that point it would only require one repo.

The impact on the downstream pipelines of users (what does PR stand for?) is a big concern, noted in @mikebada 's announcement, with some trepidation.

mikebada commented 5 years ago

@cmungall In its current implementation, we (manually) edit the master file (master.owl in the refactored MSO/SO repo), and code written by @msinclair2 generates the refactored MSO and merged-SO-and-MSO files from that. Do you envision problems with this methodology?

cmungall commented 5 years ago

PR = pull request.

Do you envision problems with this methodology

I think the methodology is potentially fine, but it seems this all still needs to be documented and coordinated? Maybe this has been done already, it looks like the latest edit on the SO was 5 months ago https://github.com/The-Sequence-Ontology/SO-Ontologies/commit/d6d59cdd775676dbb7e636797f0faf140c7a2d9b it seems this change made it's way into MSO too, although the mechanism is a bit mysterious to me. It would make me happier to see more explicit plans laid out as to how things will be managed. At the moment if someone wants to contribute a PR on the SO site it's not very obvious that they actually have to come over to this repo, make the edit here, and then the release files will be copied across over to the SO repo...?

Have you checked the output of the new pipeline with the current release of SO? It looks like there is a fairly big diff.

I recommend laying out all the next steps as github tickets

mikebada commented 5 years ago

Yes, we do need to write up documentation. I think there are at least a few more issues we still have to take care of, including updating. I'll consult with @msinclair2 about moving forward with these soon, and laying out the next steps as GitHub tickets sounds like a good idea, so we'll do that once we've compiled these issues.

mikebada commented 5 years ago

@cmungall Just in case it's not clear: The SO in its current public version is not what is automatically generated from the MSO, so that's why the current public SO and the MSO look so different. The classes of the refactored SO are largely formally defined as being generically dependent on their corresponding MSO classes; therefore, the hierarchical structuring of the refactored SO is nearly identical to that of the MSO, since the reasoner infers all of the hierarchical relationships from the MSO. (The major differences are that the refactored SO additionally has some classes that don't make much sense as MSO entities, and the MSO has SDCs and occurrents that aren't in the refactored SO.) If you examine MSO-SO_merged.owl, you can see these parallel hierarchies of the MSO and refactored SO. (No need to reason over this, as it should be pre-reasoned.) This is why I've been nervous about the reaction particularly from the sequence annotation community, as the structuring of the SO (mostly it's upper-level structuring) is quite different from the current public SO.

cmungall commented 5 years ago

Thanks for all the info. As a first step I'd recommend documenting the current edit protocol in the README-editors.md file (e.g. MSO authors periodically examine commit history in SO tracker, and manually replay into master.owl?)