globalwordnet / cili

The Global WordNet Association Collaborative Inter-Lingual Index
Other
40 stars 8 forks source link

GitHub pages version of repo #11

Closed jmccrae closed 1 year ago

jmccrae commented 3 years ago

I was discussing with Piek and it seems that the best way to host the ILI would be to create a GitHub pages site from this repository. As such we should implement a redirect, such that we link from:

http://globalwordnet.org/ili/i123 => https://globalwordnet.github.io/cili/i123

This would involve creating a repo with lots of small files, so I might try to test this on a private repo first.

goodmami commented 3 years ago

Sounds good in general.

This would involve creating a repo with lots of small files

Yes, over 100K files. It looks like it's technically possible to have that many files in Git, Ext4 directories, and NTFS folders, but it might not be the best, and each time we regenerate those files it's a huge change to the repo history. We also can't use dynamic routing on GitHub Pages (without hacking the 404 page), so we can't even really tie those to a simple database backend, either. If we need actual paths to static files, I suggest putting it all in the gh-pages branch so we keep the main branch history clean.

Another alternative is to put it all in one big file (a quick test shows a JSON dump of the IDs and definitions is about ~8.3M currently) and use fragments/anchors in URLs like https://globalwordnet.github.io/cili#i123. This would allow us to easily search by definition as well, but it also means the whole thing is downloaded when someone looks up a single ID. Also I'm not sure if linked data and RDF allow fragments as distinguishing identifiers.

jmccrae commented 3 years ago

I think given how the redirects work, we would have to go with having very many files. In particular, we have a single URL for each of the IDs so they should only return the relevant information for that ID. We can automatically generate this out of the single Turtle file that currently exists. Of course, this should probably be on a different branch

fcbond commented 3 years ago

At least lots of little static files will be fast.

What should we serve? ttl? cml? html?

On Thu, Sep 30, 2021 at 4:59 PM John McCrae @.***> wrote:

I think given how the redirects work, we would have to go with having very many files. In particular, we have a single URL for each of the IDs so they should only return the relevant information for that ID. We can automatically generate this out of the single Turtle file that currently exists. Of course, this should probably be on a different branch

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/cili/issues/11#issuecomment-931085040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRS5BLQ4XSXIHEKEXZTUEQRHDANCNFSM5E7FSFVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

jmccrae commented 3 years ago

I think the previous version served N-Triples versions of the ILI data. It's a bit more verbose but easier to work with

goodmami commented 3 years ago

I don't think people will fetch and process this data programmatically, so why not serve a more human-friendly format like HTML? E.g., something like:

i16181

Concept

between or among galaxies

Status: active Source: [Princeton WordNet 3.0]() — [wn30-02849367-a]() — intergalactic

Represented in:

  • [Open English WordNet]() — [oewn-02860360-a]() — intergalactic
  • [WOLF (Wordnet Libre du Français)]() — [frawn-02849367-a]() — intergalactique
  • [OpenWordnet-PT]() — [own-pt-synset-02849367-a]() — intergaláctico, intergalático

The "Represented in" part is not in the ili.ttl file but it could be generated automatically. But we could leave it off if we want to keep things minimal.

jmccrae commented 3 years ago

Yeah, that could be better. I am starting to think that maybe a small server could do the trick better, like a Heroku free instance?

goodmami commented 3 years ago

Something like that was my initial thought, but if the tons-of-small-files thing works fine, it might be easier to manage. Shall we try it out, first?

fcbond commented 3 years ago

I hope in the fairly near future (weeks not months) to have my server situation sorted out, and then I can host it but I think it would make sense to go for static html for the moment.

On Tue, Oct 5, 2021 at 11:51 AM Michael Wayne Goodman < @.***> wrote:

Something like that was my initial thought, but if the tons-of-small-files thing works fine, it might be easier to manage. Shall we try it out, first?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/cili/issues/11#issuecomment-934035479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRX5UQZLTROGWJEAU3TUFJY45ANCNFSM5E7FSFVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami commented 2 years ago

I threw together #12 for generating the files. Let me know what you think.

goodmami commented 2 years ago

As an aside, if we want to put the HTML files up in a repository that is not this one, we could use https://github.com/globalwordnet/ili so the resulting URL still seems appropriate. One problem is that this repository was renamed from that one, so if we repurpose it, the redirects will break. This point is important if we have published documents with the old URL, but we could at least have a landing page pointing people to the proper destination.

goodmami commented 1 year ago

Can this issue be closed? I think it was resolved in #12.

I notice a couple things that could be improved, but maybe via separate issues:

  1. The globalwordnet.org links (e.g., http://globalwordnet.org/ili/i123) point to HTTP pages on GitHub (http://globalwordnet.github.io/cili/i123) instead of HTTPS (https://globalwordnet.github.io/cili/i123). It works, but HTTPS is available, so that would be preferable.
  2. The make-html.py script is not really documented, nor how to publish the results to the gh-pages branch. Even better might be a CI action to publish the pages on a release.
jmccrae commented 1 year ago

Yes, we can probably close. For (1) I think this is not really a problem, the HTTP URLs also work and we get a redirect. For (2) I have created a pull request that closes this issue

goodmami commented 1 year ago

Ok thanks, I'll close this now.