Future of geolexica-server

From @skalee

I am already convinced to migrate away from Jekyll at some point. It is good for blogging or landing sites, but not necessarily for sites like ours. Extracting common stuff to building blocks is difficult, writing tests is difficult (in fact, we have almost none of them, and rather than relying on them I'm running diffs to spot differences between old and new), documentation is confusing and incomplete, and more.

From quite some time, I am thinking what feature set we need, and I have a more less clear vision already. I am considering either migrating to another tool or implementing a new one.

We don't use plenty of Jekyll features, so writing a new one is a considerable option. Nevertheless, there is a number of problems which need to be addressed, including processing SASS, rendering templates, composing pages from layouts and partials, or shadowing generic templates with ones dedicated for particular sites. I hope many can be taken from Rails, especially Asset Pipeline, maybe Action View which is responsible for rendering, and some of view helpers. I hope that some testing helpers can be imported too. I haven't prepared any proof of concept however, so it's difficult to tell how much work is really needed.

Perhaps we should migrate to another existing tool. Nanoc is another generator written in Ruby designed around a different philosophy than Jekyll is. It is likely to match our needs, but on the other hand it seems over complicated, at least at the first glance. I need to dig into documentation before telling more. Anyway, it should be far more flexible than Jekyll.

Integrating site generation into Glossarist that you mentioned is another promising idea. I suppose it may enrich Glossarist application with some nice features like live previews. Maybe this will help in creating Geolexica mobile app easier too, if such app is desired and if Cordova-like tools are acceptable. On the other hand, site generation will require 3rd-party tools like LaTeXML or AsciiDoctor, which in that case would have to be bundled with Glossarist, which may cause difficulties on some platforms. Furthermore, site generation requires things like RDF or TBX-ISO-TML generation, which would have to be implemented in Glossarist. But again, this can be a feature in context of live previews.

However, with all above being said, Jekyll issues are manageable, and I've got used to them. Sometimes bit more configuration is needed, sometimes we have to be flexible, but so far it was never a big issue and I doubt it will ever be. I am more worried about breaking something accidentally due to insufficient testing. Also, diff-based testing is potentially quite time-consuming, depending on situation, but though Jekyll is an obstacle indeed, this also isn't a thing which can be improved easily just by switching to another tool.

Perhaps this should be discussed in a separate thread.

Originally posted by @skalee in https://github.com/geolexica/geolexica-server/issues/125#issuecomment-658153184

@strogonoff and I have been discussing for some time a data-centric approach where the website is built on top of a webserver that serves data. This way we can build a static site using web server data -- similar to the approach of Gatsby, so that we can have dynamic pages that work like static.

A related approach (which you and I may have discussed?) is the "packaging of a website" so that it is portable, a package that includes all parameters and config (e.g. redirects) that can be packaged and transmitted: opened in a browser, or served as a web service. Like a "webarchive" or the ARC format, but allows opening and serving.

What do you think?

Yes, generally the idea currently in progress is to develop the website using React components that are rendered into fully complete static HTML, which then can become interactive depending on browser’s capabilities. I think Geolexica.org will be the next site using this approach (concept search functionality—already involving React I think—will definitely make great use of this feature), though it would have to remain built with Jekyll for a little longer until React build is sorted out properly.

Website build stage gathers data from various sources and APIs and passes it to React pages. After JS is initialized, that data is sometimes augmented with newer real-time data fetched at runtime, where useful.

As a side-effect, since we are using React, we will be be able to share fundamental components between the software used for managing data (e.g., Glossarist, ITU OB editor, possibly future Metanorma GUI, etc.) and final deliverables (geolexica.org, ituob.org, PDFs and standalone HTMLs, etc.), ensuring consistency and reliable preview. E.g., we would use the same components for showing terminological entry properties on a website and in a desktop app consistently, while at the same time being able to extend those components in varying ways, pass different configuration options to them, and embed them in different contexts.

A related approach (which you and I may have discussed?) is the "packaging of a website" so that it is portable, a package that includes all parameters and config (e.g. redirects) that can be packaged and transmitted: opened in a browser, or served as a web service. Like a "webarchive" or the ARC format, but allows opening and serving.

I don’t mind the idea of “webarchive” format in general, but I don’t think there is a point in trying to come up with one format that fits both standalone offline use and online deployment. Unless we ship a Chromium with it, it will be subject to cross-browser compatibility issues, or require the user to install a browser separately, etc.

Speaking of shipping Chromium, what I think could be interesting is having our software work in such a way that a user can clone a register repoistory, double-click a certain file and it would open in our app for browsing and/or editing register data.

Speaking of shipping Chromium, what I think could be interesting is having our software work in such a way that a user can clone a register repoistory, double-click a certain file and it would open in our app for browsing and/or editing register data.

The goal is to be able to distribute a "rendered data package" e.g. a document repository with many standards. Shipping with an executable is "okay" to some extent, but government computers will certainly prefer the unbundling of data vs executable. i.e. we should always have the option to separate the renderer vs data packages.

On the other hand, site generation will require 3rd-party tools like LaTeXML or AsciiDoctor, which in that case would have to be bundled with Glossarist, which may cause difficulties on some platforms.

@skalee Metanorma already bundles them in packed-mn which is a single binary version that supports macOS/Linux/Windows.

As a side-effect, since we are using React, we will be be able to share fundamental components between the software used for managing data (e.g., Glossarist, ITU OB editor, possibly future Metanorma GUI, etc.) and final deliverables (geolexica.org, ituob.org, PDFs and standalone HTMLs, etc.), ensuring consistency and reliable preview. E.g., we would use the same components for showing terminological entry properties on a website and in a desktop app consistently, while at the same time being able to extend those components in varying ways, pass different configuration options to them, and embed them in different contexts.

Although I've mentioned sharing components as a possible benefit, I'm still not convinced that it really is.

How many of these fundamental components are really worth sharing between desktop app and site? If it's only about some stylesheets and a localized concept template, then we don't need to merge both applications. There are other ways to do so. So what else? Site generator accesses concepts in a read-only fashion, thus it doesn't require complex logic to manage these data. Neither we need to validate concepts during site generation if they're validated on release. Hence, I don't see much win here.

Sites have their configuration (term languages, supported API formats, etc.). I presume these things don't really apply to Glossarist, so we have another difference here. Furthermore, some sites have overriden templates, which is another kind of site fine-tuning, and which is unknown to Glossarist I presume.

Perhaps termlinks processing could be shared, but this is a tiny feature. Perhaps math conversion could be shared, but only if we bundle LaTeXML and possibly some LaTeX distribution, which are the only proper tools for that job, at least in production site generation.

Speaking of "ensuring consistency and reliable preview", it will be only as much reliable as math rendering is reliable. So again: reliable preview requires LaTeXML. I don't know if we need that in Glossarist, perhaps almost reliable MathJax will suffice.

For these reasons, I have plenty of doubts about benefits of sharing things between Glossarist and site generator. To me, Glossarist and site are very different. And we have plenty of things implemented in Ruby already…

Finally, all this doesn't really addresses my concerns regarding insufficient testing. All the other Jekyll drawbacks are manageable, and I'm used to them already. The new tool, whatever it will be, should be more about site testing. I'd really love a framework which supports template tests like Rails, which is really strong in this area. In fact, site testing capability is the main reason I've even considered creating a new tool.

@skalee Metanorma already bundles them in packed-mn which is a single binary version that supports macOS/Linux/Windows.

That's good news. Though Metanorma is a very different technology stack than Glossarist.

geolexica / geolexica-server

Future of geolexica-server #126