Option to generate offline static HTML files usable without server

tomchen commented 3 years ago

🚀 Feature

docusaurus build will build a local production version that has to be docusaurus serve'd to be usable. Can we add an option to build an offline static HTML files that are usable completely without any server, so user can just open index.html with a browser to read the whole documentation.

It's about calculating relative (instead of absolute) URLs, and appending "index.html" at the end of the URLs. Algolia search will have to be removed, any online cloud assets will have to be put in local folders.

Have you read the Contributing Guidelines on issues?

Yes

Comment, Motivation, Pitch

What about other static site generators and libraries?

Gatsby, React, etc.'s build all do the similar thing, they all need a server.

Gatsby has this feature request for option to build such offline static HTML site: gatsbyjs/gatsby#4610, which is closed without the issue being solved. Users keep asking for the feature and for reopening the issue. According to one comment, in Gatsby v1 it actually can generate such static site, it is in v2 it doesn't work.

React serves general purpose and Gatsby is made for any website. But Docusaurus, is primarily made for documentation, it may need the feature of the offline version generation more than React and Gatsby do.

PDF and ebook formats

There is already a feature request, #969, that asks for option to create an offline version in PDF format. It is obviously brilliant to be able to make PDF and maybe also EPUB, MOBI, AZW. PDF and these ebook formats may have less security concern than HTML. But the downsides are, it may be a little time-consuming to achieve the PDF feature; those interactive navs and TOCs and colorful website design and layout will have to be removed in PDF and other ebook formats. Offline static HTML is easier to make. If PDF feature is in the long-term plan, then Offline static HTML could be in a shorter-term to-do list.

Compressed web file format

The offline static web files usable without server, could be simply compressed as a zip or in other common archive formats. User will need to uncompress the file and click index.html in the root folder to use it.

They can also be compiled in CHM (Microsoft Compiled HTML Help), problem is it is a bit old and it does not have native support in non-Windows OS. It's a little surprising there's no standard or universally accepted file format similar to CHM. Perhaps it's due to security concerns.

ColinLondon commented 6 months ago

@andrigamerita I cannot get Tabs for work offline.

I am using Docusarus Static as an offline template. I have a tabs arrangement as a test in a .md file:

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

<Tabs>
  <TabItem value="apple" label="Apple" default>
    This is an apple 🍎
  </TabItem>
  <TabItem value="orange" label="Orange">
    This is an orange 🍊
  </TabItem>
  <TabItem value="banana" label="Banana">
    This is a banana 🍌
  </TabItem>
</Tabs>

This runs just fine on the development server (npm run start), with the tabs switching on click, as expected.

But when I build the site (npm run build+postprocess) the tabs display okay, but I cannot client to switch tabs.

Reproduce by using the code segment above and then run with the commands shown.

Any ideas? Thanks 😉

ColinLondon commented 6 months ago

@andrigamerita Further to my post above, here's some more info...

It seems that the onclick is being stripped out on build, but not when running on the dev server. So

onclick="openCity(event, 'Tokyo')"

is in the md file and runs after npm run start

but it is stripped out after npm run build+postprocess.

Does that help your efforts? Would be good to get this sorted. Cheers.

slorber commented 6 months ago

@ColinLondon if you want to get support from the author of Docusaurus Static, please go to the project repository discussions directly: https://gitlab.com/octospacc/editocttrialTools/-/tree/main/docusaurus-static

This issue is about supporting this feature in core directly. It is not the place to ask for personal support for a tool we do not maintain.

Please keep in mind many people receive notifications whenever you ask a question here.

slorber commented 6 months ago

I'm not sure how Docusaurus offline support should work

So I'd like to ask the community for help.

SSG or not?

Is it really important to emit one static file per page?

Or is it fine if we emit just one index.html and render content with React locally?

Do you expect users to navigate subfolders and click on a file such as ./local_site/docs/category/myDoc/index.html ?

Do you expect users to always start from ./local_site/index.html ?

Isn't it confusing if the root folder contains thousands of subfolders and html files, that it is even hard to locate the index.html (homepage) of the site?

Routing

How do you expect routing to work?

Do you expect to be able to bookmark a page or something?

What is the URL of the file you click on? file:///.../local_site/index.html ?

What is the URL after you click on a link to another page?

file:///.../local_site/docs/myDoc/index.html ?
file:///.../local_site/index.html#!/docs/myDoc/ ?
something else?
do you even care?

Any remark on how routing should work?

Other systems supporting offline deployments

It would greatly help if you could show me actual static site deployments that can work offline without a server.

Please provide a zip file with such a static deployment, so that others can download it and try it locally.

The more we see examples of things that work offline, the more we can understand what should be implemented.

I'll start with Material for MkDocs,:

Offline plugin: https://squidfunk.github.io/mkdocs-material/plugins/offline/
site.zip

Config:

site_name: MkLorum
site_url: https://example.com/
nav:
  - Home: index.md
  - About: about.md

theme:
  name: material

plugins:
  - offline

Unlike Docusaurus, it is not a single page application hydrating a front-end framework, and links are navigated through the regular navigation (not history.pushState()):

file:///.../local_site/index.html
file:///.../local_site/about.html

I'm interested to see other similar examples.

With regular HTML navigations, we can probably find a few ones, notably the Javadoc: https://docs.oracle.com/javase/8/docs/api/

But is there any SPA framework that supports such an offline mode, and using history.pushState() based soft navigations, without fully reloading the HTML page when navigating?

I could not find any SPA / client-side navigation example, so maybe we are the first to attempt this and are in uncharted territory.

As far as I know, none of those frameworks support it (or it's not documented): Docusaurus, VitePress, VuePress, Nextra, Astro/Starlight, Gatsby, Dokz, Docsify, Rspress, Fumadocs, Vocs...

If you know any client-side frontend framework supporting an offline mode, I'm super interested to hear about it and get a .zip file with such a deployment so that I can study it.

We need your help

I know many of you want this issue solved, and we'll give it a try.

But so far the discussions have remained quite abstract: we still don't really know what it means for Docusaurus to support such offline mode.

As you can see, this feature you all take for granted isn't so commonly supported 😅 .

Now it's your time to help us.

Please help us figure out how this should work:

provide concrete examples of other frameworks to take inspiration from (in particular based on SPA/React/Vue/Svelte...)
specify the precise behavior you think Docusaurus should adopt

ilg-ul commented 6 months ago

@slorber, before going into the depths of such a tough question, perhaps you could clarify exactly what Docusaurus intends to be.

From https://docusaurus.io/docs/advanced/ssg I retain Strictly speaking, Docusaurus is a static site generator, but the rest of the page is very cryptic, and although I did read it several times, as a former Jekyll user who knows (almost) nothing about React, I don't know exactly what Docusaurus is. :-(

The main question from an outsider is probably related to the need to generate the client side. If Docusaurus is a static site generator, and the generated site is a collection of html pages, why Docusaurus also generates the tons of jsons for the client side? What are the advantages of doing so?

After you clarify this, perhaps the rest of your questions are easier to grasp.

ColinLondon commented 6 months ago

@ilg-ul Thanks, I appreciate that. I am an author, not a developer or web designer. For reasons I won't other going in to, I am looking to deploy the documentation as a static suite that can be run without a server from any file location. My initial research narrowed it down to Hugo or Docusaurus. Hugo supports more diverse web designs, while Docusaurus seems a good choice for documentation. But whereas Hugo can support offline out the box, I only discovered that Docusaurus does not once I had put a lot of time and effort into it. HTH

ilg-ul commented 6 months ago

I only discovered that Docusaurus does not once I had put a lot of time and effort into it.

I don't know what this means, since Docusaurus does generate a folder full of .html files, one for each page, that I archive and publish on GitHub Pages automatically via some simple GitHub Actions.

BTW, the previous comment was addressed to @slorber.

ColinLondon commented 6 months ago

@ilg-ul Yes, but you can only run the site on a server, not offline. (BTW: I thought that was to me!)

ilg-ul commented 6 months ago

... you can only run the site on a server, not offline.

Ah, sorry, I missed that aspect. After reading the thread from the beginning I got your point.

slorber commented 6 months ago

Hey, I'm sorry but all these discussions are off-topic and I'm hiding them. This is not the place to ask about what Docusaurus is.

I'm still going to answer briefly.

Docusaurus is a React-based static site generator. It generates static HTML pages using React, and then hydrates React client-side. When navigating, we don't navigate to another HTML document, but we render the next page with client-side JavaScript using soft navigations and the history.pushState() API to update the URL.

This kind of navigation is what permits Docusaurus to feel fast when clicking on a link, and also preserves the state of UI elements on the current page (for example the collapsible state of sidebar categories).

This is a very different model from Jekyll, Hugo, Eleventy, MkDocs, Sphynx, and many other SSG tools that do not use client-side navigation and use a more traditional/old-school approach, but are usually less "interactive". Docusaurus v1 also worked that way, using React only during the build process and not loading React on the client side.

If you open your Chrome DevTools network tab on v1.docusaurus.io VS docusaurus.io, you will notice a big difference when navigating. v1 will request a new HTML page, while now v2+ will request JS to render locally the new page.

If you don't understand what Docusaurus, React, hydration, SPA, history API, and all these things are, then it is unlikely that you will be able to help us solve this issue.

slorber commented 6 months ago

Investigations

I've investigated 2 approaches so far:

SSG: emit one static file per route: https://github.com/facebook/docusaurus/pull/9857
Hash Router: single empty index.html file, rendering content client side with React: https://github.com/facebook/docusaurus/pull/9859

I have also investigated using external tooling such as wget to crawl the site and download it for offline usage:

mkdir wget-test
cd wget-test
cp -R ../projects/docusaurus/website/build/ .
wget -mpEknH http://localhost:3000/

This kind of works, but it is somehow the same solution as the first one (SSG) where each page has a dedicated static html file.

SSG

https://github.com/facebook/docusaurus/pull/9857

To me, the SSG approach is quite challenging. Notably, I'm not even sure dependencies such as React-Router can do routing using the file:// protocol. That remains to be investigated.

However, it could work decently if you are ok with opting out of the SPA mode of Docusaurus and are ok with not hydrating React on the client. This means that things we implement with interactive React code will not work (tabs, theme switch, category collapse button, mobile drawer...). We try to make things work without any JS (https://github.com/facebook/docusaurus/issues/3030) but there are still a few things that require JS and/or React. This also makes it impossible for you to include React interactive inside docs (through MDX) however non-interactive React elements (such as admonitions) are perfectly fine.

If you want to give this mode a try, I'd suggest to run this command on your computer. This is kind-of the equivalent of the HTML post-processing scripts that people shared earlier, the links and assets will use relative paths.

wget -mpEk https://tutorial.docusaurus.io/

(there are JS loading console errors, but that's kind of on purpose: if JS succeeds in loading, then you'll get a 404 page being rendered after React hydration because React-Router does not know what to render for file://. However this mode might be a decent fallback if you want a good-enough/almost working experience)

Hash Router

https://github.com/facebook/docusaurus/pull/9859

The Hash Router solution looks easier to implement, and I'm almost able to make it work on our website, apart from a few linking edge cases to investigate.

However, I'm not sure if it's the solution the community is looking for considering there would be a single HTML file emitted, and that file would initially be empty.

Here's the deploy preview online demo of the Hash Router based app: https://deploy-preview-9859--docusaurus-2.netlify.app/#/

The local app using file:// would behave the same, and you'll always open it through the index.html entry point. Even though there's a single entry point file, you can still have deep linking and bookmark URLs.

You will notice that we have a "loading..." screen before the content appears. This is because the initial html file is empty and all the app is rendered with JS.

lebalz commented 6 months ago

+1 for the Hash Router.

We use Docusaurus for interactive teaching-websites in a highschool and we'd like to give our students the ability to get a snapshot of the website when they completed their grade and leave the school.

Important points for our usecase:

an easy way to start - a single index.html file inside the .zip-folder would just be perfect
react on the client-side - support for all interactive parts and common state management libraries
optional: local search
optional: bookmarks

Thanks a lot - this really would be a huge thing for our school-department :)

jeacott1 commented 6 months ago

@slorber hashrouter sounds like an excellent solution to me. Definitely preferable to SSG imo, and leaves more features available over the long term.

slorber commented 6 months ago

Thanks for your feedback.

I'll focus on implementing proper support for the Hash Router then.

It doesn't mean that we won't eventually support other "modes" later, but at least this one is a good starting point.

Other alternatives to be considered:

Wrapping the static deployment inside an Electron/Tauri WebView
Package the web server as an executable (see vercel/pkg)

Of course, I'm not a fan of those approaches (using a bazooka to kill a fly), but afaik you could implement them in userland today if you really need to solve this problem right now and be able to package/distribute your docs for offline usage.

ilg-ul commented 6 months ago

I'll focus on implementing proper support for the Hash Router then.

Could you confirm that plain static html files for each page will continue to be supported for the foreseeable future?

slorber commented 6 months ago

I'll focus on implementing proper support for the Hash Router then.

Could you confirm that plain static html files for each page will continue to be supported for the foreseeable future?

This would be a new build mode you enable through a CLI option.

So yes everything else will remain retrocompatible and Docusaurus will remain a static site generator

tonyeggers commented 5 months ago

Thanks for your feedback.

I'll focus on implementing proper support for the Hash Router then.

It doesn't mean that we won't eventually support other "modes" later, but at least this one is a good starting point.

Other alternatives to be considered:

Wrapping the static deployment inside an Electron/Tauri WebView

Package the web server as an executable (see vercel/pkg)

Of course, I'm not a fan of those approaches (using a bazooka to kill a fly), but afaik you could implement them in userland today if you really need to solve this problem right now and be able to package/distribute your docs for offline usage.

@slorber ... thank you for working on this. Just to confirm another use case, I'm hosting documentation in this fashion within an ERP platform (which I won't mention) due to abysmal support for doing anything remotely useful or flexible for this purpose. It's basically a static resource defined within the ERP environment, which gives me authentication by default for users already authenticated into the ERP. So now I can have a flexible git-controlled documentation process and keep my docs private and secure. I could even build a CI/CD process to load updates into the ERP if I wanted. Right now, I'm using the post-process solution created by @andrigamerita. I have to wrap it using a supported technique, but as long it works completely offline it works. Thanks!

lebalz commented 3 months ago

Wow, looking forward to try it! Thanks for implementing this option 😍🥳

slorber commented 3 months ago

Hey 👋

The Hash Router PR has been merged: https://github.com/facebook/docusaurus/pull/9859

The hash router is useful in rare cases, and will:

use browser URLs starting with a /#/ prefix
opt-out of static site generation
only emit a single index.html file
only do client-side routing and rendering
can be browsed offline without a web server with the file:// protocol

You can try this new experimental site option:

export default {
  future: {
    experimental_router: 'hash', // default to "browser"
  }
}

If you need to switch conditionally between normal/browser router and hash router, you can use a Node env variable. We don't provide any --router CLI option but you can easily do ROUTER=hash docusaurus build instead, and read process.env.ROUTER in your config file.

To dogfood this feature, make it easier to review and ensure it keeps working over time, we build our own website with the hash router and:

Deploy it here: https://facebook.github.io/docusaurus/#/
Publish GitHub Actions website artifact: https://github.com/facebook/docusaurus/actions/workflows/build-hash-router.yml?query=branch%3Amain

An example artifact you can download is available here: https://github.com/facebook/docusaurus/actions/runs/9159577535

This will download a website-hash-router-archive.zip file.

Unzipping it gives you a static deployment. You can open it and browse locally without a web server by simply clicking the index.html file.

EXPERIMENTAL FEATURE:

The hash router is experimental.

It will be released in Docusaurus v3.4, but can already be tried in canary releases.

We strongly discourage you from using a baseUrl with it. If you have a use-case for a hash baseUrl, please share it, because we might forbid that in the future. It is likely that the useBaseUrl and useBaseUrlUtils have some edge cases with hash routing because these abstractions were not meant to handle a hash router in the first place.

Otherwise, there may be unhandled edge cases that we missed, so please report here any issue you have by providing a repro. Remember that third-party plugin authors may also need to adjust their code to support this new router. Although it should work out-of-the-box for most plugins, we can't guarantee that it will.

Thanks and please let us know if this feature works well for you.

pfdgithub commented 3 months ago

In my case, I use the browser route, and docs.routeBasePath: "/" configuration. The docs is deployed behind traefik and nginx multiple gateways, and the customer network is is a private network.

Traefik uses the stripprefix middleware to strip the xxx prefix. Nginx uses the try_files $uri $uri/ /index.html directive to find files. When the url path triggers the gateway fallback, it will respond with the incorrect redirect path.

# url with filename
browser (/xxx/category/index.html) -> traefik (/category/index.html) -> nginx(/category/index.html) -> nginx status 200 -> traefik -> browser

# url with trailing slash
browser (/xxx/category/) -> traefik(/category/) -> nginx(/category/) -> nginx status 200 (rewrite to `/category/index.html`) -> traefik -> browser

# url without trailing slash
browser (/xxx/category) -> traefik(/category) -> nginx(/category) -> nginx status 301 (redirect to `/category/`) -> traefik -> browser (/category/) -> traefik status 404 -> browser

Due to the /category directory exists, nginx attempts to redirect to that directory. However, nginx is unaware of the xxx prefix, so it responds with the incorrect redirect to /category/. If the /category directory doesn't exist, nginx can respond with the correct rewrite to /index.html.

# url without trailing slash (`/category` directory doesn't exist)
browser (/xxx/category) -> traefik(/category) -> nginx(/category) -> nginx status 200 (rewrite to `/index.html`) -> traefik -> browser

I can't modify the configurations of traefik and nginx, and trailingSlash: true is not a perfect solution, so I can only clean up redundant directories after building. Is it possible to allow disabling the Static Site Generation (SSG) functionality, as well as useless SEO metadata under a private network, such as allowing url: "/"?

slorber commented 3 months ago

@pfdgithub your comment is quite hard for me to understand. So far I am not even sure if it is even related to the current issue because none of the URLs you share have a hash, and the hash part of the URL shouldn't affect routing and redirects in any way. If you want help, make sure it's relevant to the current issue,create a smaller repro, and try to explain better including fully qualified urls because the way you share urls right now does not even make it clear which router config you use.

pfdgithub commented 3 months ago

@pfdgithub your comment is quite hard for me to understand. So far I am not even sure if it is even related to the current issue because none of the URLs you share have a hash, and the hash part of the URL shouldn't affect routing and redirects in any way. If you want help, make sure it's relevant to the current issue,create a smaller repro, and try to explain better including fully qualified urls because the way you share urls right now does not even make it clear which router config you use.

Sorry, this comment is not about hash route. It is a further discussion of the following comment.

https://github.com/facebook/docusaurus/issues/448 https://github.com/facebook/docusaurus/issues/3825#issuecomment-909030682 https://github.com/facebook/docusaurus/issues/3825#issuecomment-1165572115 https://github.com/facebook/docusaurus/issues/3825#issuecomment-1948240586 https://github.com/facebook/docusaurus/issues/3825#issuecomment-1949512896

dingbo8128 commented 1 month ago

Awesome work! but it's a pity that some local search plugins are not compatible with this feature now. for example https://github.com/easyops-cn/docusaurus-search-local

slorber commented 1 month ago

@dingbo8128 unfortunately all search plugins I know crawl the static HTML files. Since we now emit a single empty HTML file and use client-side JS to display the actual content, it's not possible to crawl the HTML files anymore for search engines to index your content.

The community will have to provide a different implementation for this new hash router mode. Since we can't read the HTML files directly, it will likely require using a headless browser to run the HTML pages and extract the rendered content out of it.

Maybe external search engines like Algolia would keep working, considering they run an external crawler. I don't know, if someone gives it a try I'm curious. Although, it's not ideal since it would require network access to get the search results.

Note that our sitemap does not emit a sitemap.xml when using the hash router. This is a limitation that we could probably handle if it is helpful to implement a local search plugin for the hash router. However I'm not sure it's a good practice to include URLs with # in a sitemap file. In this case sitemap.xml is probably useless for search engines 🤷‍♂️

dspatoulas commented 1 month ago

I'm running into an issue with the generated links returning a 404. Here's an example from the deployed Docusaurus site after navigating to Blog > Docusaurus 3.4 > Hash Router - Experimental (from right side menu).

https://facebook.github.io/docusaurus/#/blog/releases/3.4%23hash-router---experimental

The link should direct users to the #hash-router---experimental section of the 3.4 release page, but instead returns Page Not Found if copy and pasted into another browser tab.

slorber commented 1 month ago

@dspatoulas how did you obtain that link?

The GitHub UI doesn't show it this way, but your link is:

https://facebook.github.io/docusaurus/#/blog/releases/3.4%23hash-router---experimental

/3.4%23hash-router---experimental => it's using %23 instead of # (the encoded version)

The same link using # works:

https://facebook.github.io/docusaurus/#/blog/releases/3.4#hash-router---experimental

And afaik nowhere in our UI we use %23 so I wonder how did you get that link in the first place.

To be honest I'm surprised in doesn't work in %23, it should IMHO, but it's likely a bug in React Router v5 hash router impl and I suspect it won't be fixed considering they are working toward v7 stable now.

facebook / docusaurus