OpenRefine / openrefine.org

Source website for openrefine.org
https://openrefine.org
Other
133 stars 119 forks source link

Allow PDF file download for offline documentation #275

Open thadguidry opened 8 months ago

thadguidry commented 8 months ago

There is a Docusaurus community plugin to produce PDF files for documentation: https://github.com/signcl/docusaurus-prince-pdf

which is referenced here: https://docusaurus.io/community/resources#community-plugins

This might allow us to generate a PDF version of our docs for users to have (thinking of regular users, training, etc.) as an example to drop to USB drives easily for offline reference. We could then put a submenu perhaps directly under Documentation nav called "Offline PDF Docs download" or some such. I think we had 1 or 2 folks ask for this before? dunno.

monstajoe2002 commented 6 months ago

Hi! Can you assign me this issue?

monstajoe2002 commented 6 months ago

I just finished with creating the pdf. It was a lot of work, but I got it working here. I used github actions to automate the pdf creation process using docusaurus-prince-pdf and upload it as an artifact. There is a catch, however. In order to generate the PDF, we must spin up the local server in the cloud and gh actions are asynchrounous i.e. if a step fails the whole workflow stops unless we state otherwise. The serve command required runs fine but it's blocking the execution of the other steps. Therefore, I had to cancel the workflow to execute the remaining steps. Sorry for the long rant. Can I still make pull request here? Or do you have a better solution. Maybe I'm new to Github Actions so idk 😅

wetneb commented 6 months ago

That sounds like a pretty complicated solution indeed, I am sorry you spent so much time on it! Re-generating the PDF file at each build of the website looks too complicated and is probably overkill. We could probably do it once in a while, outside of the CI. It's not like the docs change so often.

So I think it would be perfectly fine to just make a PR adding the Docusaurus plugin without it being added to the continuous deployment pipeline (which is running in Netlify, not GitHub Actions by the way). We can think afterwards about how to advertise it on the website.

monstajoe2002 commented 6 months ago

Thanks bro! One thing I should mention about this plugin is that you run it with npx docusaurus-prince-pdf without integrating it in the project.

monstajoe2002 commented 6 months ago

I created a PR for this issue: https://github.com/OpenRefine/openrefine.org/pull/304

magdmartin commented 6 months ago

Thanks for the PR @monstajoe2002. Sorry we didn't check the license thoroughly before starting the integration. Upon review of the plugin and Prince License, I realized that the non-commercial license requires us to :

If this is a Non-commercial license, Licensee may download, install and use the Software for Non-commercial Purposes on a computer that is accessible to any number of end users. PDF documents generated by the Software include notices that identify the Software. Licensee shall not change or remove these notices or assist or encourage third parties to remove or change such notices. When the Non-commercial license is used to routinely generate documents, a prominent link to the www.princexml.com Web site shall be displayed on the pages from where the generated documents can be fetched, and in a prominent public Web page where business partners are listed. If PDF documents are sent as email attachments by Licensee, all email messages must contain the www.princexml.com Web address in the message body.

I'm not sure if we're interested in this since the doc-to-pdf plugin is available under MIT.

I suggest making a link to the PDF available at the end of left menu.

magdmartin commented 6 months ago

The PDF is lengthy with 389 pages and could be more readable. I suggest adding a table of contents and generating separate PDFs for user manual and contributor guide.

monstajoe2002 commented 6 months ago

Fair enough, where are you gonna store that PDF? Should I regenerate it with an open source tool instead?

magdmartin commented 6 months ago

If there is no objection from other team member, I suggest storing the PDF directly in the repository in the /static/uploads/ folder (along with other PDF).

Before coding the integration, can you check if there are any other available plugins? I am asking because it seems that doc-to-pdf is not actively maintained. Additionally, we plan to translate our documentation. Could you check how the plugin would support this?

monstajoe2002 commented 6 months ago

The PDF is lengthy with 389 pages and could be more readable. I suggest adding a table of contents and generating separate PDFs for user manual and contributor guide.

I also forgot to mention that there's a table of contents in this file. You can view it in the sidebar of any PDF reader e.g. Acrobat, Chrome, etc. If you want an actual table of contents in the PDF itself, it would be a challenge. Also, docs-to-pdf doesn't work on my machine so I had to use Prince.

monstajoe2002 commented 6 months ago

If there is no objection from other team member, I suggest storing the PDF directly in the repository in the /static/uploads/ folder (along with other PDF).

Done.

thadguidry commented 2 months ago

A much better way to handle this would be to NOT USE PDF, but instead bundle the website and maintain all the HTML content.

Folks would locally view the site just by unzipping the bundled site and then clicking on the index file and the routing is handled through hash routing and the site locally accessed with file://index.html. We could bundle the site (or perhaps just the /docs folder etc. I.E. exclude from the bundle: /blog and /uploads folders) at build time into a GitHub artifact release and on the site have a link in the footer or header that downloads that offline site zip file. The implementation would use the new Docusaurus offline browsing feature now supported in version 3.4 (which I also have a PR for folks to review that fixes the broken anchors).

More detail on how to do this is here: https://docusaurus.io/blog/releases/3.4#hash-router---experimental