LiUSemWeb / HeFQUIN

HeFQUIN is a query federation engine for heterogeneous federations of graph data sources, including federations of knowledge graphs.
https://liusemweb.github.io/HeFQUIN/
Apache License 2.0
19 stars 2 forks source link

Towards v.0.1.0 #348

Open hartig opened 4 months ago

hartig commented 4 months ago

/cc @keski anything else?

keski commented 2 months ago

There is now a basic static website located under the website/ directory in the gh-pages branch. To publish this page automatically on build we need to set this up under the project settings: https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site (I don't have sufficient permissions)

I'd suggest triggering every time we push to gh-pages. Later we can set up a separate workflow on gh-pages that in turn triggers on pushes to main, e.g., for rebuilding javadocs automatically.

@hartig

hartig commented 2 months ago

There is now a basic static website located under the website/ directory in the gh-pages branch. To publish this page automatically on build we need to set this up under the project settings: https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site (I don't have sufficient permissions)

Done. See: https://liusemweb.github.io/HeFQUIN/ :-)

Notice, to make this work I had to rename the directory website/ to docs/ because Github didn't recognize website/.

I'd suggest triggering every time we push to gh-pages.

That should work now.

Later we can set up a separate workflow on gh-pages that in turn triggers on pushes to main, e.g., for rebuilding javadocs automatically.

Perhaps. But maybe we don't need Javadocs for the "bleeding-edge" code. (Or, maybe we do?)

keski commented 2 months ago

Done. See: https://liusemweb.github.io/HeFQUIN/ :-)

Fantastic!

But maybe we don't need Javadocs for the "bleeding-edge" code. (Or, maybe we do?)

You're probably right. Publishing javadocs for the bleeding edge seems a bit overkill. But perhaps for updates to the ontologies or similar.

keski commented 2 months ago

I’ve been working on the CLI scripts and realized that placing the CLI wrapper scripts in hefquin-cli feels a bit strange since they are not bound to that module only, e.g., they are also used to reference hefquin-service.

Should we instead place the CLI wrapper scripts in their own directory in the project root (e.g. scripts/, tools/, hefquin-scripts/ or hefquin-tools/)? Or even have the bin/ and bat/ directories directly in the root?

hartig commented 2 months ago

I think that having bin/ and bat/ directly in the root makes sense.

keski commented 2 months ago

The basic setup for the w3id namespace is now in place: https://w3id.org/hefquin/

hartig commented 2 months ago

The basic setup for the w3id namespace is now in place: https://w3id.org/hefquin/

Perfect!

keski commented 2 months ago

I've now added scripts to publish the vocabularies (formats: ttl, rdf/xml, and jsonld) and to generate a simple HTML documentation page from each vocabulary. Since the vocabularies are not full OWL ontologies we could not reuse pyLODE).

In the gh-pages branch, I've added copies of the vocabularies under hefquin-vocabs/, renamed them, and updated their namespaces. The files named engineconf.ttl and lpg2rdfconf.ttl are picked up automatically and used for publishing and documentation generation.

@hartig If you want more details on the documentation page or instead handle this manually just let me know.

I've set up our w3id to support content negotiation such that if we visit e.g. http://w3id.org/hefquin/engineconf/ in a browser the request is redirected to https://liusemweb.github.io/HeFQUIN/vocab/engineconf/latest/index.html Passing an accept header (e.g. text/turtle) redirects to https://liusemweb.github.io/HeFQUIN/vocab/engineconf/latest/engineconf.ttl This also means the ontologies can be imported, e.g. into Protege.

hartig commented 2 months ago

I've now added scripts to publish the vocabularies (formats: ttl, rdf/xml, and jsonld) and to generate a simple HTML documentation page from each vocabulary.

Great! However, where exactly are these scripts? Are these the files in the ./docs-scripts/ directory within the gh-pages branch? Assuming yes, can you please also add a README.md into this directory, with a brief sentence about the purpose of the directory and a sentence about each of the files.

Since the vocabularies are not full OWL ontologies we could not reuse pyLODE).

What exactly is it that is missing In our vocabularies? And, does it make sense to add these things in order to be able to use pyLODE?

In the gh-pages branch, I've added copies of the vocabularies under hefquin-vocabs/, renamed them, and updated their namespaces.

Why copies? Is it not possible to have just one Turtle file per vocabulary? I am worried that having multiple copies means we always have to remember to keep everything in sync.

@hartig If you want more details on the documentation page or instead handle this manually just let me know.

The less we have to do manually, the better ;-)

I've set up our w3id to support content negotiation such that [...]

Sounds good!!

keski commented 2 months ago

I've now added scripts to publish the vocabularies (formats: ttl, rdf/xml, and jsonld) and to generate a simple HTML documentation page from each vocabulary.

Great! However, where exactly are these scripts? Are these the files in the ./docs-scripts/ directory within the gh-pages branch? Assuming yes, can you please also add a README.md into this directory, with a brief sentence about the purpose of the directory and a sentence about each of the files.

Sounds good, I'll create the README.

Since the vocabularies are not full OWL ontologies we could not reuse pyLODE).

What exactly is it that is missing In our vocabularies? And, does it make sense to add these things in order to be able to use pyLODE?

I think adding the necessary things makes sense since the documentation produced by pyLODE is much more refined. What we need is, e.g., to state that the vocabulary is an OWL ontology, as well as add some annotations properties that are required by pyLODE. I'll create an issue for it.

In the gh-pages branch, I've added copies of the vocabularies under hefquin-vocabs/, renamed them, and updated their namespaces.

Why copies? Is it not possible to have just one Turtle file per vocabulary? I am worried that having multiple copies means we always have to remember to keep everything in sync.

We want to be able redirect to the "latest" version of the vocabularies, as well as the versioned ones, without updating our .htaccess file under http://w3id.org/hefquin. We could redirect the user to the latest version dynamically but that would only work in the browser, e.g., curl or Protege. If github pages supported PHP we could also redirect using HTTP headers, but only static pages are supported.

As part of the publication process, the vocabularies are automatically copied to both locations, so we should not have to worry about keeping it in synch.

@hartig If you want more details on the documentation page or instead handle this manually just let me know.

The less we have to do manually, the better ;-)

I've set up our w3id to support content negotiation such that [...]

Sounds good!!

hartig commented 2 months ago

(I'm starting to catch up with all these things)

I've now added scripts to publish the vocabularies (formats: ttl, rdf/xml, and jsonld) and to generate a simple HTML documentation page from each vocabulary.

[...] can you please also add a README.md into this directory, with a brief sentence about the purpose of the directory and a sentence about each of the files.

Sounds good, I'll create the README.

That one is still an open TODO, right?

[...] What we need is, e.g., to state that the vocabulary is an OWL ontology, as well as add some annotations properties that are required by pyLODE. I'll create an issue for it.

I see that the issue is #361 and you have created a branch for it. I will take a look at it and continue the discussion of this in that other issue.

Why copies? Is it not possible to have just one Turtle file per vocabulary? I am worried that having multiple copies means we always have to remember to keep everything in sync.

We want to be able redirect to the "latest" version of the vocabularies, as well as the versioned ones [...] As part of the publication process, the vocabularies are automatically copied to both locations, so we should not have to worry about keeping it in synch.

Sure, the files can be copied into the corresponding sub-directories of ./doc/ as part of the publication process. No problem with that.

My question was about the copies that you created under ./hefquin-vocabs/ (within the gh-pages branch). Why do we need two copies of the files there? Was that only because you want the files be named in this shorter form (e.g., engineconf.ttl instead of EngineConfiguration.ttl) or because the filename needs to match with the name in the URI prefix?

Already now the two copies are out of sync in the branch that you created for #361.

In any case, I am fine with renaming these two files under ./hefquin-vocabs/ (also in main), instead of having two copies of them there. And when I say "renaming", I mean doing it using git mv to retain the commit history.

keski commented 2 months ago

(I'm starting to catch up with all these things)

I've now added scripts to publish the vocabularies (formats: ttl, rdf/xml, and jsonld) and to generate a simple HTML documentation page from each vocabulary.

[...] can you please also add a README.md into this directory, with a brief sentence about the purpose of the directory and a sentence about each of the files.

Sounds good, I'll create the README.

That one is still an open TODO, right?

Yes, still a TODO.

My question was about the copies that you created under ./hefquin-vocabs/ (within the gh-pages branch). Why do we need two copies of the files there? Was that only because you want the files be named in this shorter form (e.g., engineconf.ttl instead of EngineConfiguration.ttl) or because the filename needs to match with the name in the URI prefix?

Already now the two copies are out of sync in the branch that you created for #361.

In any case, I am fine with renaming these two files under ./hefquin-vocabs/ (also in main), instead of having two copies of them there. And when I say "renaming", I mean doing it using git mv to retain the commit history.

Indeed, the copies you are referring to are not needed.

hartig commented 2 months ago

Yes, still a TODO.

I created #363 for this TODO.

Indeed, the copies you are referring to are not needed.

Okay, then I propose to do the following in PR #362.

Once this is done, I can take care of adapting the build.sh script in ./hefquin-vocabs/.

keski commented 2 months ago

I modified build.sh in ./hefquin-vocabs/ and updated the tests, but please verify that everything looks okay.

hartig commented 2 months ago

but please verify that everything looks okay.

Looks okay. But I will double check by also running it on my computer when I am back on the computer.