HajoRijgersberg / OM

Ontology of units of Measure
88 stars 23 forks source link

Basic Release Pipeline Proposal #92

Open jmkeil opened 11 months ago

jmkeil commented 11 months ago

This PR is an proposal of a basic release pipeline to solve #90 and is based on #91. It removes dc:date and owl:versionInfo from om-2.0.ttl om-2.0.rdf and adds a pipeline to automatically add dc:date and owl:versionInfo on releases and to generate a RDF/XML serialization. In the pipeline, ROBOT (based on the OWL API) is used to perform these actions.

To trigger the pipeline and add release information, one needs to create a release with a new tag of the style v*.*.*. The version number will get picked from the tag name by the pipeline. The OWL/XML and TTL with data and version number will automatically be added to the release.

In addition, the pipeline will be triggered by pushes and pull requests, to automatically generate the serialization variants (and potentially run generation scripts and checks later one), but without adding release information.

Example from the fork repository:

HajoRijgersberg commented 11 months ago

Hey Jan Martin, thanx this looks great! The only thing is, we need to keep om-2.0.rdf as the source file, in its present form. Could you adapt your pipeline such that it runs every time a new version of om-2.0.rdf is published, and use that file as input for your pipeline? Please see also my other comments in related issues and PRs.

HajoRijgersberg commented 11 months ago

Also, at this stage, I think we should not change the format of the version numbers. So, could you keep the format ..*? I have to dive into what kind of change (major, minor, patch) the change of a version number format in itself is. And I would not want different version formats for the different versions of OM files that could be generated using the pipelines. Hope this (and the above and earlier comments in several issues and PRs) are no problem to you. Really appreciate all your effort!

jmkeil commented 11 months ago

Could you adapt your pipeline such that it runs every time a new version of om-2.0.rdf is published, and use that file as input for your pipeline?

It is easy to adapt the pipeline to use om-2.0.rdf. But it is not trivial to automatically generate a release each time om-2.0.rdf is changed, as one would need to automatically determine the version number.

I think we should not change the format of the version numbers.

The version number format in the RDF files was not changed. It it only the git tag, that has the v at the beginning, as this format is common practice on GitHub. However, I removed the language tag from the version info literal and I changed the formatting of the date, as I switched the datatype from xsd:string to xsd:date.

HajoRijgersberg commented 11 months ago

Thanx again so much for your response, Jan Martin. You know how much I appreciate all your effort.

It is easy to adapt the pipeline to use om-2.0.rdf.

That is great! :)

But it is not trivial to automatically generate a release each time om-2.0.rdf is changed, as one would need to automatically determine the version number.

I understand, but I'll manage the dates and version numbers. It's not ideal I know, but it is less important than the transparency of the quality of the contents of OM. To put it simply.

The version number format in the RDF files was not changed. It it only the git tag, that has the v at the beginning, as this format is common practice on GitHub.

Clear, thanx!

However, I removed the language tag from the version info literal

Ah, shall I do so accordingly in om-2.0.rdf? For my understanding: why should it be removed?

and I changed the formatting of the date, as I switched the datatype from xsd:string to xsd:date.

Sounds good, but can you perhaps explain why in 2023/09/28</dc:date> the date is a string? Maybe a stupid question but I don't know.

jmkeil commented 10 months ago

I updated the pull request to use om-2.0.rdf.

HajoRijgersberg commented 10 months ago

I updated the pull request to use om-2.0.rdf.

That is so fantastic, Jan Martin, many thanx! Please allow me to ask some questions, just for my understanding:

  1. So you use om-2.0.rdf as the basis for generating other files like om-2.0.ttl? You do not change om-2.0.rdf? I ask this because in the yml file I see: --output om-2.0.rdf.
  2. And I see you have removed the owl:versionInfo and dc:date from om-2.0.rdf, or am I wrong?
  3. Or are you working in a copy of om-2.0.rdf? The original file should keep its owl:versionInfo and dc:date of course.

Looking forward to your response. Maybe my questions are silly. Hope you can help me and answer these questions. Many thanx in advance! :)

jmkeil commented 10 months ago
  1. So you use om-2.0.rdf as the basis for generating other files like om-2.0.ttl? You do not change om-2.0.rdf? I ask this because in the yml file I see: --output om-2.0.rdf.
  1. And I see you have removed the owl:versionInfo and dc:date from om-2.0.rdf, or am I wrong?

You are right. But it gets added by the pipeline if running for a release tage. This has the advantage that intermediate (non-release versions) do not have a version number. This is an advantage because:

The automatic adding of the version number would become even more important, as soon as some statements get automatically added/removed by an extended pipeline. Then the file in the repository is only the (incomplete) "source" (which should not be used in production), but the pipeline output is the (complete) "build" (which is intended for use in production).

  1. Or are you working in a copy of om-2.0.rdf? The original file should keep its owl:versionInfo and dc:date of course.

The file in the repository does not get changed by the pipeline. But there will be a modified om-2.0.rdf file in the pipeline output (called pipeline artifacts).

HajoRijgersberg commented 10 months ago

Hey Jan Martin, Just saw your message come in, and coincidentally also had the opportunity to respond. Clear answers, thanx. However (unfortunately there is a 'but' here), the original om-2.0.rdf must not be altered... It should really remain as it is, with date and version number, in its present order/structure, for reasons given earlier. Would you perhaps see a chance to adapt the pipeline one more time such that om-2.0.rdf can remain as it is, with its date and version number? The great benefit would really be in the generation of derived versions of OM, in DL, etc. Hope you don't mind my words! If so, my sincere apologies (you're doing such great jobs for OM!). And many thanx of course in advance for your attention and - hopefully - the adaptation of the pipeline. All the best and good luck, Hajo

jmkeil commented 9 months ago

I updated the PR to not change om-2.0.rdf anymore. The pipeline now generates RDF/XML and TTL serializations (using OWL API) and, in case of a release tag, adds them to the release.

HajoRijgersberg commented 8 months ago

Hey Jan Martin, thanx but I meant that also the pipeline should not alter om-2.0.rdf. Could you perhaps adapt that accordingly, i.e., that the pipeline will not affect om-2.0.rdf in any way? Hope it's no problem for you. But many thanx in advance! My apologies for any inconvenience.

jmkeil commented 8 months ago

Hi Hajo. Just to exclude a misunderstanding: The pipeline does not make any change in the repository. It only takes files from it, uses and maybe changes them, and finales stores the resulting files as artifacts (see e.g. https://github.com/jmkeil/OM/actions/runs/6549193162) of the pipeline execution (job).

HajoRijgersberg commented 6 months ago

Hi Jan Martin, and again my apologies for my late response and thanx again for yours! :) Indeed I thought files from this repository were changed. I'll study it soon again, intendedly within a few weeks - maybe longer since I expect that I need some time. I'll study it with this new knowledge that will probably help a lot. Hope to get back to you soon! :)

HajoRijgersberg commented 3 months ago

Hi Jan Martin,

Sorry that it took so long. There are so many things for me to dive into (I mean other than only this issue of course). I'm sure/convinced you understand. I took a look at the yml file, and I always want to understand everything. How can I see that the files are stored elsewhere, not in this repository? Is that perhaps indicated here:

runs-on: ubuntu-latest
container: obolibrary/robot:v1.9.5

Could you explain to me, help me?

A more general question that I have, since I update the version number and the date of OM manually, e.g. by changing someone's committed file (as you have helped me by pointing out that that is possible), of course in your repository you are fully free what to do, but why would you then want to generate version number and date automatically? And would these then not deviate from the version number and date that I update?

Last question I have: many if not all things that we discussed above was from the perspective that I thought it was about this Github. So probably I have given you (many?) wrong advices. How do you see that?

Apologies for all my questions, and thanx so much for your answers in advance!

Best, Hajo

jmkeil commented 3 months ago

Sorry that it took so long. There are so many things for me to dive into (I mean other than only this issue of course). I'm sure/convinced you understand. I took a look at the yml file, and I always want to understand everything. How can I see that the files are stored elsewhere, not in this repository?

It is not directly visible in the yml file itself, but obvious from the general way how GitHub handles files. In our case these four types of files are relevant:

A workflow typically workflow/pipeline pulls the repository files (e.g. the source code of a program, processes them (e.g. compiling, testing), maybe stores some artifacts (e.g. unit test results), and under some conditions (e.g. release branch, no failed tests) generates a release and attaches release assets (e.g. executable binaries of a program) to them.

These types of files exist in parallel without affecting each other, if not explicitly specified in the workflow. Of course, a workflow could also push to the repository, but that would need to be scripted explicitly in the workflow.

since I update the version number and the date of OM manually, e.g. by changing someone's committed file (as you have helped me by pointing out that that is possible), of course in your repository you are fully free what to do, but why would you then want to generate version number and date automatically? And would these then not deviate from the version number and date that I update?

There is a bunch of reasons to automatize this:

So, the idea is to not have a version number in the git repository at all, but only in the artifacts and assets.

many if not all things that we discussed above was from the perspective that I thought it was about this Github. So probably I have given you (many?) wrong advices. How do you see that?

Yes, I think the above comments were based on the misconception of the separation between repository files, artifacts and assets.

HajoRijgersberg commented 3 months ago

It is not directly visible in the yml file itself, but obvious from the general way how GitHub handles files. (...) Of course, a workflow could also push to the repository, but that would need to be scripted explicitly in the workflow.

But how should that look like then? It looks already like presently the repository file is affected.

There is a bunch of reasons to automatize this: (...) So, the idea is to not have a version number in the git repository at all, but only in the artifacts and assets.

Certainly, I know all that, but for other reasons - as discussed before - we don't do that (yet?) in this OM Github, at least not for the original om-2.0.rdf. But my question was more like: would a version number and date that you create automatically in your OM Github not deviate from the version number and date that I create here (on this OM Github)? Of course, you can do that, but I thought: is that then - under the described circumstances (that I create those manually) - a sensible thing to do? This question just for my understanding.

Hope you can answer my questions again, Jan Martin. Very much appreciated, all your help, patience, answers, etc.! :)

jmkeil commented 1 month ago

It is not directly visible in the yml file itself, but obvious from the general way how GitHub handles files. (...) Of course, a workflow could also push to the repository, but that would need to be scripted explicitly in the workflow.

But how should that look like then? It looks already like presently the repository file is affected.

You would see some git commit and git push command in the workflow code. During the workflow the repository is cloned into the docker container, just as if you clone the repository to your local machine: Changes at the files in the cloned repository will not affect the remote repository as long as you do not push them to the remote.

There is a bunch of reasons to automatize this: (...) So, the idea is to not have a version number in the git repository at all, but only in the artifacts and assets.

Certainly, I know all that, but for other reasons - as discussed before - we don't do that (yet?) in this OM Github, at least not for the original om-2.0.rdf. But my question was more like: would a version number and date that you create automatically in your OM Github not deviate from the version number and date that I create here (on this OM Github)? Of course, you can do that, but I thought: is that then - under the described circumstances (that I create those manually) - a sensible thing to do? This question just for my understanding.

Of course, they could conflict. That is the reason why I supposed to remove them from the repository and only (automatically) add them during the release process. That way, it is for sure they will not conflict.

But the version number would not get out of your control: The script would determines it based on a version tag at the commit, which must be added manually. The difference is that a tag does not change the content of the commit nor does it require an additional commit. The commits stay as they are, but from time to time a version tag is added causing a release. See for example this demo release triggered by the version tag v0.0.0: The file in the tagged commit does not contain version information, but the files attached to the release do.

jmkeil commented 3 weeks ago

I just restored the initial workflow (but with a few updates of pipeline dependencies and a rebase on the latest commit in om/master to avoid merge conflicts):

Now the PR (again):

With that, doing a release would only require you to create a tag on a commit and cause/require zero changes to the commit and the files in the repository as well as the commit history by you and by the pipeline.

Merging this would enable to work on further PRs like: automatic OWL profile checking, OWL profile variants generation, numeric datatype variants generation, automatic deployment on the OM website, ...

Would you consider to merge this?