facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
54.51k stars 8.18k forks source link

Option to generate offline static HTML files usable without server #3825

Closed tomchen closed 2 months ago

tomchen commented 3 years ago

πŸš€ Feature

docusaurus build will build a local production version that has to be docusaurus serve'd to be usable. Can we add an option to build an offline static HTML files that are usable completely without any server, so user can just open index.html with a browser to read the whole documentation.

It's about calculating relative (instead of absolute) URLs, and appending "index.html" at the end of the URLs. Algolia search will have to be removed, any online cloud assets will have to be put in local folders.

Have you read the Contributing Guidelines on issues?

Yes

Comment, Motivation, Pitch

What about other static site generators and libraries?

Gatsby, React, etc.'s build all do the similar thing, they all need a server.

Gatsby has this feature request for option to build such offline static HTML site: gatsbyjs/gatsby#4610, which is closed without the issue being solved. Users keep asking for the feature and for reopening the issue. According to one comment, in Gatsby v1 it actually can generate such static site, it is in v2 it doesn't work.

React serves general purpose and Gatsby is made for any website. But Docusaurus, is primarily made for documentation, it may need the feature of the offline version generation more than React and Gatsby do.

PDF and ebook formats

There is already a feature request, #969, that asks for option to create an offline version in PDF format. It is obviously brilliant to be able to make PDF and maybe also EPUB, MOBI, AZW. PDF and these ebook formats may have less security concern than HTML. But the downsides are, it may be a little time-consuming to achieve the PDF feature; those interactive navs and TOCs and colorful website design and layout will have to be removed in PDF and other ebook formats. Offline static HTML is easier to make. If PDF feature is in the long-term plan, then Offline static HTML could be in a shorter-term to-do list.

Compressed web file format

The offline static web files usable without server, could be simply compressed as a zip or in other common archive formats. User will need to uncompress the file and click index.html in the root folder to use it.

They can also be compiled in CHM (Microsoft Compiled HTML Help), problem is it is a bit old and it does not have native support in non-Windows OS. It's a little surprising there's no standard or universally accepted file format similar to CHM. Perhaps it's due to security concerns.

RDIL commented 3 years ago

You can use electron to freeze it I believe

tomchen commented 3 years ago

You can use electron to freeze it I believe

That'd be a super overkill. Even you have just one webpage, Electron will make it 80-100 MB, putting the whole browser rendering and scripting engines in it.

kinzlp commented 3 years ago

+1

vrabota commented 3 years ago

+1

slorber commented 2 years ago

@ohkimur mentionned he built a postprocessing step to enable support of local browsing, using the file:// protocol.

https://github.com/facebook/docusaurus/issues/448#issuecomment-908777029

That doesn't look like a bad idea to build this as a postprocessing step.

A Docusaurus plugin could be built in userland to solve this problem. Plugins have a postBuild lifecycle to use for that.

Note: such plugin should take into account the config.trailingSlash option because output files are not always /path/index.html anymore, and can also be /path.html

slorber commented 2 years ago

Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).

Building a site for local offline usage does not prevent you from setting site URL and baseUrl in the config file, otherwise, the build output would not be suitable for online hosting.

For these reasons, it's very unlikely we'll add support for using "relative baseUrl" in Docusaurus, such as baseUrl: '': it would lead to an output that would only be correct for local usage, it's likely users will deploy sites with broken metadata online without noticing the SEO problems.

roguexz commented 2 years ago

Moving my conversation from #448 to this thread

@ohkimur - your suggestion works for the most part of it, but Webpack configurations are still being difficult to resolve

@slorber - my use case isn't for offline usage. I am trying to put together a simplistic developer workflow which involves publishing documentation to GitHub pages. At my workplace, we are using GitHub enterprise.

The use case is as follows,

Given that baseUrl needs to be defined as a fixed path (e.g., /pages/_GH_ORG_/_GH_REPO_NAME_/), it becomes difficult to view the incoming changes till it has been merged.

I understand that this is not how most people work. This is part of an exercise where I am trying to encourage my team to get into the habit of documenting their software. The pandemic has made things worse because reviewing the UI / UX now requires a meeting instead of being able to just view the documentation against their repositories.

Any ideas that you might have to improve this workflow / process are most welcome.

I'm more of a Java/JVM guy .. which isn't helping and making the hacking process that much more challenging. Any help is greatly appreciated.

ohkimur commented 2 years ago

@slorber I created the docusaurus-plugin-relative-paths to solve the issue. I used the same post-processing approach using Docusaurus postBuild lifecycle. πŸ¦–πŸ˜Ž

slorber commented 2 years ago

@roguexz , if you used a modern Jamstack tool like Netlify or Vercel (both much better than GH pages), you'd get a much better experience and all PRs would have a "deploy preview" link that includes the changes from the PR, ensuring they are valid (docusaurus can build and you can check the end result a merge would lead to before doing that merge).

See this Docusaurus PR, the Netlify bot added a link so that the PR can be reviewed more easily: https://github.com/facebook/docusaurus/pull/5462#issuecomment-910481735

This is very easy to set up.


@ohkimur thanks! hope people will like this solution.

One interesting idea could be to have 2 mods:

ohkimur commented 2 years ago

@roguexz , if you used a modern Jamstack tool like Netlify or Vercel (both much better than GH pages), you'd get a much better experience and all PRs would have a "deploy preview" link that includes the changes from the PR, ensuring they are valid (docusaurus can build and you can check the end result a merge would lead to before doing that merge).

See this Docusaurus PR, the Netlify bot added a link so that the PR can be reviewed more easily: #5462 (comment)

This is very easy to set up.

@ohkimur thanks! hope people will like this solution.

One interesting idea could be to have 2 mods:

  • modify build files
  • keep build unchanged, create a copy of build, modify it and generate a build/site.zip archive that users could download?

@slorber I think this is a great idea. If you want, you can open a issue here and I will work on it. πŸ±β€πŸ‘€

larissa-n commented 2 years ago

Picking up on @RDIL's comment: I added the build output files to an electron app and encountered a few issues. After specifying each of the files in package.json and with docusaurus-plugin-relative-paths (thank you, @ohkimur!), the HTML content is rendered fine with all images, but electron is still looking for scripts in file:///assets/ based on a reference in runtime~main.xxx.js. Any idea how this could be fixed?

larissa-n commented 2 years ago

A very rough way to fix script references is baseUrl: './'. However, this also messes with routes, so a somewhat more correct approach is to change only o.p= in the compiled runtime~main.xxx.js (not sure if there's a more elegant way, but unless there is, one idea might be to make this part of the docusaurus-plugin-relative-paths postprocess script). There are also references in main.xxx.js that point to an absolute directory as well. Now most scripts load, but they re-render all pages as the 404 page / NotFound component. Of course, getting rid of parts.push('exact: true');in @docusaurus/core/lib/server/routes.js doesn't exactly fix the problem, since sub-routes won't load. Why does it have to check the route match, is that just for the prefetching? It seems odd that content is switched to NotFound once scripts load, since the static content looks fine and everything is in the right place while scripts fail to load.

Also, not sure if this is documented anywhere but to dev with npm run start, I had to deactivate docusaurus-plugin-relative-paths plugin.

Docusaurus creates quite a few js files to keep track of if you work in an environment that requires you to list every single file. I'm used to react-static, and its builds consist of far fewer files.

ohkimur commented 2 years ago

@larissa-n Thank you for your observations. I know about the issue you mentioned, but I didn't fix it since I didn't find an elegant approach to do it. If you already have a potential solution (even though it's messy) I invite you to make a pull request in the plugin's repo. I can extend it later if necessary.

Also, can you describe the problem you had when you tried npm start? Isn't the plugin called only when a build is triggered? If not, then this is a bug and it might be a good idea to fix it.

sigwinch28 commented 2 years ago

Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).

Why must they be fully qualified, @slorber?

I've just started to use docusaurus and I find the baseUrl rather limiting, because I kind of expected a copy-the-html-files-anywhere exeprience for deployment anywhere, without extra configuration. I don't understand why there's a tight coupling to baseUrl. Your comment hints at why it exists.

Is this necessity documented?

Josh-Cena commented 2 years ago

@sigwinch28 Yes, see https://docusaurus.io/docs/advanced/routing#routes-become-html-files

slorber commented 2 years ago

Note: for some Docusaurus features (particularly SEO metas such as social image, i18n hreflang...), URLs in the HTML files MUST be fully qualified absolute URLs (domain + absolute path).

Why must they be fully qualified, @slorber?

I've just started to use docusaurus and I find the baseUrl rather limiting, because I kind of expected a copy-the-html-files-anywhere exeprience for deployment anywhere, without extra configuration. I don't understand why there's a tight coupling to baseUrl. Your comment hints at why it exists.

Is this necessity documented?

It's not just coupling to a /baseUrl/, it is coupling to your domain as well.

There are multiple things in Docusauurs relying on that, in particular SEO metadata like canonical URL

<link data-rh="true" rel="canonical" href="https://docusaurus.io/docs/myDoc">

What Google says: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls

CleanShot 2022-06-24 at 15 14 29@2x

Although relative URLs seem supported (maybe only by Google?), it's not recommended.

Similarly, meta hreflang headers for i18n sites:

CleanShot 2022-06-24 at 15 21 18@2x

https://developers.google.com/search/docs/advanced/crawling/localized-versions

CleanShot 2022-06-24 at 15 16 56@2x

(including the transport method means you also can't switch from HTTP to HTTPS without a Docusaurus config change)

Similarly for og:image metadata responsible for providing the social card preview on your site on social network

<meta property="og:image" content="https://docusaurus.io/img/socialcard.png"/>

Using a relative URL can lead to failures to display the card and does not respect the spec: https://ogp.me/#data_types

CleanShot 2022-06-24 at 15 20 04@2x

It's not a Docusaurus-side constraint, it's a constraint that comes from outside.

You really have to build your site for a specific protocol/domain/baseUrl.

Now I understand in some cases you don't care about the features above and prefer to have more "deployment flexibility", but for now we don't support that.

sigwinch28 commented 2 years ago

Fantastic answers. Thank you.

jeacott1 commented 1 year ago

@ohkimur it looks like your completely deleted your docusaurus-plugin-relative-paths project? what happened?

ohkimur commented 1 year ago

@jeacott1 Yeah. I did. I want to invest my time into something different.

justsml commented 1 year ago

@ohkimur I appreciate your OSS work!

Could you please put your docusaurus repos up as 'Archived'? Even temporarily? Maybe even email me? πŸ˜…

(I'm trying to help a former student make sense of some README notes left by a previous dev. It has permalinks to your docusaurus-plugin-relative-paths, and all I see is 404s. πŸ’”And the mystery deepens...)

If you'd rather not deal with it at all, I do understand. I hope your new focus is rewarding.

Best wishes, Dan.

jeacott1 commented 1 year ago

@justsml - I've been using the script here which for my purposes works well. https://github.com/facebook/docusaurus/issues/448#issuecomment-908777029 I wish there was a pure js search that worked with this mode though.

dtlhlbs commented 1 year ago

Adding a use-case note here. Our users may have a critical need to use our products when their internet is down, this includes access to the documentation as the products are complex. Being able to distribute the most recent version of the docs with the applications as they're built would be a huge win and allow us to migrate all content to Docusaurus.

Josh-Cena commented 1 year ago

@dtlhlbs Have you considered https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-pwa ?

dtlhlbs commented 1 year ago

@Josh-Cena I see that and will implement it, but it's not much help if users are already offline and haven't set up the PWA.

dingbo8128 commented 1 year ago

@justsml - I've been using the script here which for my purposes works well. #448 (comment) I wish there was a pure js search that worked with this mode though.

Tabs will not work since some js error.

TheHaloDeveloper commented 1 year ago

So from what I hear, there's no way to do this yet?

siwa8111 commented 1 year ago

Rylly

AHgPuK commented 1 year ago

This should be a killer feature. Waiting for it so long...

jeacott1 commented 10 months ago

@prtnYuvalJ yeah makes no sense to me. It's not like online hosted sites cant use relative urls! and the site is generated, so surely you can choose your target if you desperately want absolute urIs!

jeacott1 commented 10 months ago

@prtnYuvalJ "using relative URLs for a hosted side breaks SEO and metadata" I accept perhaps there's some merit to that but I kinda doubt google cares. regardless, it's no reason not to support generating relative versions for those who don't care or who want the facility.

fwiw - take this with as many grains of salt as you like. I asked chatgpt the question

Q: does using relative URLs for a hosted website break SEO and metadata?

ChatGPT Using relative URLs on a hosted website typically does not break SEO (Search Engine Optimization) or metadata when implemented correctly. In fact, relative URLs can offer certain advantages in website development, such as making the website more portable and easier to manage, especially when migrating the site to a different domain or when switching between development and production environments.

slorber commented 10 months ago

@prtnYuvalJ don't get me wrong, we want to support an offline mode and your use-case.

It's just not at the top of my list of priorities, not because it is not important, but we can't work on everything at once.

Absolutely crazy that this isn't a feature.

It's free open source software and that tone seems a bit patronising to me. You are free to submit a PR and implement it yourself.

If I get a nice PR that implements an offline-only mode that does not take any shortcuts and does not produce any regression for the main use-case (hosting websites online), then we can move on.

It must be absolutely clear to the user you can't use the website output for offline usage, and vice-versa.

I don't plan to work on this myself in the very short term, but you are free to submit a good PR or a POC to show the complexity of implementing it.


@jeacott1 asking the wrong question to a not-so-reliable AI doesn't really prove anything

CleanShot 2023-09-11 at 18 59 30

Google definitively uses that to handle duplicate content found online.

jeacott1 commented 10 months ago

@slorber - my chatgpt answer wasn't meant to be taken seriously, hence the many grains of salt. I'm using the hack I referenced above to good effect for what amounts to the serverless distro usecase above. largely for install/config documentation that I personally find preferable to a pdf (although I do wish the pdf output options were also better with docusaurus). The only thing I'm really lacking is a working offline search. Docusaurus has proven a really useful project and I offer many thanks to all those who contribute.

TheHaloDeveloper commented 10 months ago

Wow, that's really crazy.

kmlbgn commented 6 months ago

@justsml - I've been using the script here which for my purposes works well. #448 (comment) I wish there was a pure js search that worked with this mode though.

this script doesn't work at all for me with Docusaurus V3. A lot of the js assets are crafted with absolute links, it's breaking everything, not entirely sure if it's related to Docusaurus version.

jeacott1 commented 6 months ago

@kmlbgn assuming you've successfully implemented this, my guess would be that its a docusaurus 3 issue. I've only used this with v2. if you manage to fix it for v3 it would be great to publish the fix somewhere.

kmlbgn commented 6 months ago

No i havn't yet. Honestly I'm surprised the lack of this feature also.

I'm still working on it but its much deeper than i thought would be:

image

I should have looked into issues before investing this much time building on top of this stack. I'm now unsure what's the best option between giving up or trying to solve it. This simple feature is dealbreaker in my case.

Sadness.

jeacott1 commented 6 months ago

@kmlbgn switch to v2?

kmlbgn commented 6 months ago

Does tabs works on your solution? cf https://github.com/facebook/docusaurus/issues/3825#issuecomment-1521231442 if not then its just not usable for me, 90% of my pages use tabs element that i'm not going to rollback to html native, so it's same same. I'm definitely not looking for some frankenstein solution where parts works and parts don't. But i can't allocate my time to explore Docusaurus codebase to build this offline feature as of now.

jeacott1 commented 6 months ago

@kmlbgn I use pretty vanilla markdown for my docs, no mdx. I think the issue here is that I think the tabs mdx relies on js and not pure css, which would mean that its unlikely to work from a static local disk without setting browser options. on top of that it looks like the various react js file paths are dynamically generated and not fixed by the post-processor snipit above which is a shame - not sure if this can be fixed in a similar way without it being super brittle. When I used this it reloads the world on each <a href> which is why it works. I tested my v2 with a mdx and while it works fine served from a server, it does not work (no click event handlers loaded, and a bunch of js files arent found) from static local. if I add the js files where its looking for them c:/assets/xxx then it breaks beyond useful (presumably a bunch of refs are screwed up at that point), but darkmode works, and search pops up, so there's a ray of hope its not total folly.

kmlbgn commented 6 months ago

@jeacott1

Changing all existing path to relative was'nt really an issue (unless some are really crafted with dark magic). I set baseUrl with a set of string "jhduiffhu" that i could then match "/jhduiffhu/" and replace with "./". I set Docs baseRoutePath to / and i set trailingSlash to false. With this you catch all occurrences and paths are simpler.

image

I'm pretty sure i'm close to find a way but i'm blocked by hydration mistmatchs. What's worst is that i can briefly see the correct page displaying right before the error shows up and change page to Page Not Found. I removed all plugins as recommended @ #9379.

image

@slorber, any clues ? :S

slorber commented 6 months ago

Docusaurus is a React SPA framework: the html is statically generated (SSG/SSR) and then hydrated to become a client-side SPA app with client-side routing.

The rule of React is that the client/server html must match. If you tweak the html files generated by React, you break that contract, produce hydration errors and possibly more issues down the line. I'm not sure it's the way to go unfortunately.

kmlbgn commented 6 months ago

@slorber the only tweak was a change in paths from absolute to relative. theoretically that shouldn't be this dramatic... give me some ropes here as i'm not a react person, the modified html are being uploaded and hosted, therefore client should receive those files with changed paths too.. no ? i'm bit confused where that mistmatch happens.

this tweak was working for V2, i would assume this isn't this intrusive

also the other clue is that i can see the page displaying well for half a sec before it disappears and error procs, so i'm assuming this is due to something of lesser importance. I know its not proper but it could by me time, i couldnt see if i can help in implementing this feature later down the line.

slorber commented 6 months ago

@slorber the only tweak was a change in paths from absolute to relative. theoretically that shouldn't be this dramatic... give me some ropes here as i'm not a react person, the modified html are being uploaded and hosted, therefore client should receive those files with changed paths too.. no ? i'm bit confused where that mistmatch happens.

It's not me that dictates the rules, it's React. You can ask them to relax the rules but it's not on the Docusaurus repo it's going to happen.

I don't have any rope to give, this is simply not the way to solve this problem. The mismatch comes from a different url being in the static html file versus the url being in the JS app.

You have to be a React person to solve this problem, because both the html side and the React side must contain the same url (or you can ship your Docusaurus site with JS disabled eventually, then if you only care about the html files, but our no-js support is incomplete)

this tweak was working for V2, i would assume this isn't this intrusive

V2 didn't report React hydration mismatch errors, because it's a new React 18 feature. Still, the errors were there, and could produce annoying bugs down the line. It was already not working as well as you thought, but you didn't know.

jeacott1 commented 6 months ago

@kmlbgn I agree it is ridiculous that the decision was made to not use or at least allow for relative paths from the outset, and that its still not fixed. @slorber futzing with the build output is clearly a hack, but for me the build sans js isnt great, but at least its functional. Absolute paths are a non starter for me, and the other hack of providing a runtime env var with the basepath is also no good.

afaict react itself with create reactapp, homepage="./" and the router swapped out for hashrouter work fine as relative, statically served apps from file:// or http://

what in your view is the actual problem that needs fixing?

slorber commented 6 months ago

it is ridiculous that the decision was made to not use or at least allow for relative paths from the outset, and that its still not fixed.

No decision was made, it just happens to be this way. When Docusaurus v2 was initially prototyped, former maintainers (not me) probably didn't consider this use case as part of the initial scope (either by not thinking about it, or by cutting down the scope to have a running prototype sooner)


I don't know what should be done to support this, I just explain that the approach someone propose is not the right one, and give direction to a possible solution through my intuition.

Knowing exactly what should be done requires investigating, and putting time in.

There's no malice or info retainment from my side here, just facts.

what in your view is the actual problem that needs fixing?

Time and money is the problem.

There is a large list of tasks that are at least as important as this one, and I can only work on a single significant task at a time.

If this issue is really important to you, you can create a PR taking my advices into consideration. Note that you'd rather have React skills and be ready to invest many hours to implement a properly designed solution. I won't merge a half-backed PR that only solves your problem in a hacky way, and create pain for all other users that I'll later have to support.

Or you can donate enough money that would permit to onboard another maintainer to work on it.

I'm sorry to disappoint but this is how open source works. Complaining here does not make things progress.


As a sign of good will, I added this to the 4.0 milestone to keep it in my backlog.

If after investigating it doesn't look too complicated, we'll try to implement it. But I can make no promise, because again there are many other things to work on.

jeacott1 commented 6 months ago

@slorber thank you very much! that sounds like progress.

GarlicHiker commented 6 months ago

+1 for this feature! We would appreciate it a lot! The lack of this feature support is actually a reason why we turned away from Docusaurus to MkDocs alternative.

andrigamerita commented 6 months ago

Hello people, I've been experimenting with Docusaurus for some personal projects in the last few days, since it seemed to me to be overall pretty solid, more than the most famous alternatives, because of how it automatically handles all the content. Unfortunately, the lack of the kind of feature we're discussing here is a bit of an inconvenience to me, I consider it a pretty basic feature for an SSG.

I decided to try my hand at quickly implementing an unofficial proof-of-concept of this thing, and thought I'd share it, could be useful. It's just a post-processing script (derived from the one posted in #448) and some static CSS+JS.

Right now, this solution allows for both a standard collection of server-independent HTML files, and also an unified HTML file that contains all the pages and requires no external dependencies. Navigation is fully working on both desktop and mobile.

I too really hope that this idea gets picked up by Meta and officially integrated in Docusaurus, but, in the meantime, I've found my solution. If there's actual interest in this, I could probably write some documentation, try to add some needed features like content search; maybe also some nices-to-have, like working navigation in mobile view without JavaScript. πŸ™‚οΈ

ColinLondon commented 5 months ago

Hello people, I've been experimenting with Docusaurus for some personal projects in the last few days, since it seemed to me to be overall pretty solid, more than the most famous alternatives, because of how it automatically handles all the content. Unfortunately, the lack of the kind of feature we're discussing here is a bit of an inconvenience to me, I consider it a pretty basic feature for an SSG.

I decided to try my hand at quickly implementing an unofficial proof-of-concept of this thing, and thought I'd share it, could be useful. It's just a post-processing script (derived from the one posted in #448) and some static CSS+JS.

Right now, this solution allows for both a standard collection of server-independent HTML files, and also an unified HTML file that contains all the pages and requires no external dependencies. Navigation is fully working on both desktop and mobile.

I too really hope that this idea gets picked up by Meta and officially integrated in Docusaurus, but, in the meantime, I've found my solution. If there's actual interest in this, I could probably write some documentation, try to add some needed features like content search; maybe also some nices-to-have, like working navigation in mobile view without JavaScript. πŸ™‚οΈ


Is there a walkthrough on how to do this? I am a tech writer, not a developer. Docusaurus looks like a really good solution on the surface, but I absolutely need it to work offline, and haven't a clue how to do it.

andrigamerita commented 5 months ago

Is there a walkthrough on how to do this? I am a tech writer, not a developer. Docusaurus looks like a really good solution on the surface, but I absolutely need it to work offline, and haven't a clue how to do it.

Hello @ColinLondon, I don't know if you noticed but the demo site also hosts some documentation. If you have noticed, well, I know it's extremely poor right now, I plan to improve it a bit. I just added a new page that explains things just a little better, I hope it's good enough for a new user like you: https://editocttrialtools-octospacc-7b2d9ca9cd24bce25d929d67488fd8f33dc.gitlab.io/docusaurus-static/docs/docusaurus-static/setup-immediate/index.html. Let me know if this helps you get started. This explains how to start totally fresh with my Docusaurus-Static, assuming you don't already have a Docusaurus site. I will also finish writing ASAP the page on how to add Docusaurus-Static to an existing normal Docusaurus site, for people who would like that instead.