Closed jmccrae closed 1 year ago
Sounds good in general.
This would involve creating a repo with lots of small files
Yes, over 100K files. It looks like it's technically possible to have that many files in Git, Ext4 directories, and NTFS folders, but it might not be the best, and each time we regenerate those files it's a huge change to the repo history. We also can't use dynamic routing on GitHub Pages (without hacking the 404 page), so we can't even really tie those to a simple database backend, either. If we need actual paths to static files, I suggest putting it all in the gh-pages branch so we keep the main branch history clean.
Another alternative is to put it all in one big file (a quick test shows a JSON dump of the IDs and definitions is about ~8.3M currently) and use fragments/anchors in URLs like https://globalwordnet.github.io/cili#i123
. This would allow us to easily search by definition as well, but it also means the whole thing is downloaded when someone looks up a single ID. Also I'm not sure if linked data and RDF allow fragments as distinguishing identifiers.
I think given how the redirects work, we would have to go with having very many files. In particular, we have a single URL for each of the IDs so they should only return the relevant information for that ID. We can automatically generate this out of the single Turtle file that currently exists. Of course, this should probably be on a different branch
At least lots of little static files will be fast.
What should we serve? ttl? cml? html?
On Thu, Sep 30, 2021 at 4:59 PM John McCrae @.***> wrote:
I think given how the redirects work, we would have to go with having very many files. In particular, we have a single URL for each of the IDs so they should only return the relevant information for that ID. We can automatically generate this out of the single Turtle file that currently exists. Of course, this should probably be on a different branch
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/cili/issues/11#issuecomment-931085040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRS5BLQ4XSXIHEKEXZTUEQRHDANCNFSM5E7FSFVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
I think the previous version served N-Triples versions of the ILI data. It's a bit more verbose but easier to work with
I don't think people will fetch and process this data programmatically, so why not serve a more human-friendly format like HTML? E.g., something like:
i16181
Concept
between or among galaxies
Status: active Source: [Princeton WordNet 3.0]() — [wn30-02849367-a]() — intergalactic
Represented in:
- [Open English WordNet]() — [oewn-02860360-a]() — intergalactic
- [WOLF (Wordnet Libre du Français)]() — [frawn-02849367-a]() — intergalactique
- [OpenWordnet-PT]() — [own-pt-synset-02849367-a]() — intergaláctico, intergalático
The "Represented in" part is not in the ili.ttl
file but it could be generated automatically. But we could leave it off if we want to keep things minimal.
Yeah, that could be better. I am starting to think that maybe a small server could do the trick better, like a Heroku free instance?
Something like that was my initial thought, but if the tons-of-small-files thing works fine, it might be easier to manage. Shall we try it out, first?
I hope in the fairly near future (weeks not months) to have my server situation sorted out, and then I can host it but I think it would make sense to go for static html for the moment.
On Tue, Oct 5, 2021 at 11:51 AM Michael Wayne Goodman < @.***> wrote:
Something like that was my initial thought, but if the tons-of-small-files thing works fine, it might be easier to manage. Shall we try it out, first?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/cili/issues/11#issuecomment-934035479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRX5UQZLTROGWJEAU3TUFJY45ANCNFSM5E7FSFVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
I threw together #12 for generating the files. Let me know what you think.
As an aside, if we want to put the HTML files up in a repository that is not this one, we could use https://github.com/globalwordnet/ili so the resulting URL still seems appropriate. One problem is that this repository was renamed from that one, so if we repurpose it, the redirects will break. This point is important if we have published documents with the old URL, but we could at least have a landing page pointing people to the proper destination.
Can this issue be closed? I think it was resolved in #12.
I notice a couple things that could be improved, but maybe via separate issues:
make-html.py
script is not really documented, nor how to publish the results to the gh-pages branch. Even better might be a CI action to publish the pages on a release.Yes, we can probably close. For (1) I think this is not really a problem, the HTTP URLs also work and we get a redirect. For (2) I have created a pull request that closes this issue
Ok thanks, I'll close this now.
I was discussing with Piek and it seems that the best way to host the ILI would be to create a GitHub pages site from this repository. As such we should implement a redirect, such that we link from:
http://globalwordnet.org/ili/i123
=>https://globalwordnet.github.io/cili/i123
This would involve creating a repo with lots of small files, so I might try to test this on a private repo first.