LuxDL / DocumenterVitepress.jl

Documentation with Documenter.jl and VitePress
https://luxdl.github.io/DocumenterVitepress.jl/
MIT License
64 stars 9 forks source link

Trailing slash gives 404 #64

Open LilithHafner opened 4 months ago

LilithHafner commented 4 months ago

https://chairmarks.lilithhafner.com/v1.0.2/explanations (which does not use DocumenterVitepress.jl) redirects to https://chairmarks.lilithhafner.com/v1.0.2/explanations/.

The link https://chairmarks.lilithhafner.com/v1.1.0/explanations/, which uses DocumenterVitepress.jl gives a 404 error.

This caused all https://chairmarks.lilithhafner.com/stable/.../ links I and others previously posted to break when switching to DocumenterVitepress.jl. I agree with the DocumenterVitepress.jl choice to prefer not using trailing slashes, but URLs with trailing slashes should redirect, not 404.

fredrikekre commented 4 months ago

These are web server redirect and not caused by Documenter:

$ curl -fsSI https://documenter.juliadocs.org/stable/man/guide
HTTP/2 301                                                              <--
server: GitHub.com
content-type: text/html
location: https://documenter.juliadocs.org/stable/man/guide/            <--

Edit: This should have been posted to the Documenter issue (https://github.com/JuliaDocs/Documenter.jl/issues/2473).

LilithHafner commented 4 months ago

It is caused by what files generated by Documenter are called and where they are stored.

The root cause of this is that Documenter.jl's file structure names all files index.html and places them like path/to/guide/index.html (e.g. https://github.com/LilithHafner/Chairmarks.jl/tree/gh-pages/v1.0.2) while when using the DocumenterVitepress.jl output format that would be stored as path/to/guide.html (e.g. https://github.com/LilithHafner/Chairmarks.jl/tree/gh-pages/v1.1.0)

I think the latter is a slightly better approach, but I don't know how to transition from the former to the latter without breaking almost all links.

goerz commented 4 months ago

I'm quite confused here.

https://chairmarks.lilithhafner.com/v1.0.2/explanations (which does not use DocumenterVitepress.jl) redirects to https://chairmarks.lilithhafner.com/v1.0.2/explanations/.

I always thought that these two URLs are equivalent, even without any "redirect" by the server, but in any case https://github.com/LuxDL/DocumenterVitepress.jl/issues/64#issuecomment-1979040966 indicates that that Github normalizes the URL to have the slash at the end. If that's the case, it makes sense for Documenter to take that into account and always produce links with a trailing slash

The link https://chairmarks.lilithhafner.com/v1.1.0/tutorial/explanations/, which uses DocumenterVitepress.jl gives a 404 error.

I think that's just because the "explanations" doesn't exist, right? I don't know what happened between 1.02. and 1.1.0, but this doesn't seem related to whether DocumenterVitepress produces slashes at the end of URLs

The page that does exist is https://chairmarks.lilithhafner.com/v1.1.0/tutorial.html

As far as I can tell, this is what vanilla Documenter would have produced with the prettyurls=true, right?

Does DocumenterVitepress have an option equivalent to prettyurls?

I very much think prettyurls are preferable. Since DocumenterVitepress is a fresh start, I would actually recommend only supporting the prettyurls=true behavior. That is, I think DocumenterVitepress should only produce index.html files in folders, never a page.html. I understand that means you have to run a webserver to view the documentation locally, but people should really get used to that idea. Having different behavior when building locally vs when deploying will inevitably blow up in somebody's face, and I think having the prettyurls behavior is better, as it allows putting page-local assets in the same folder as the index.html.

asinghvi17 commented 4 months ago

Unfortunately this is not something that Vitepress supports: https://vitepress.dev/guide/routing#generating-clean-url

I always thought that these two URLs are equivalent

Not quite - without the trailing slash, it's up to the server. With the trailing slash, it auto resolves to $path/index.html.

so one can link to e.g. https://chairmarks.lilithhafner.com/v1.1.0/tutorial and that works fine, but https://chairmarks.lilithhafner.com/v1.1.0/tutorial/ will error because there is no folder and no index.html.

I could manually generate a redirect page in tutorial/index.html, which goes to tutorial.html, which would probably work.

Vitepress doesn't actually support the exact structure of Documenter, but Github Pages allows /tutorial to resolve to /tutorial.html, so the links are semantically fine.

A lot of the work here was for the markdown backend in any case, so it should be fairly easy to switch to another backend or upstream to DocumenterMarkdown at some point after we have this working well and understand how to better support other static site generators.

goerz commented 4 months ago

Oh, that's quite interesting! Thanks for that explanation!

I still think you should change the behavior to always write only tutorial/index.html because that allows the index.html to reference, e.g., a plot.png file in the same folder. So that gives you a lot more flexibility for processing non-trivial sources. Plus, you're guaranteed that the "pretty" URLs work, independent of the server configuration. But, you know, whatever works for you :-)

goerz commented 4 months ago

I don't know if this is a bug: In https://github.com/LilithHafner/Chairmarks.jl/tree/main, I don't see that the cleanURLs option is set anywhere. Yet, the links in the navigation bar on https://chairmarks.lilithhafner.com/v1.1.0/tutorial.html are all to https://chairmarks.lilithhafner.com/v1.1.0/why etc. (without the .html), which wouldn't work if it wasn't hosted on the right server.

asinghvi17 commented 4 months ago

All of the .vitepress files are copied from https://github.com/DocumenterVitepress.jl/tree/main/template, so if the user does not explicitly override by supplying their own file, the default file is used (which does set cleanURLs).

LilithHafner commented 4 months ago

In the interim, used this script to setup 200 redirects from https://chairmarks.lilithhafner.com/v1.1.0/tutorial/ to https://chairmarks.lilithhafner.com/v1.1.0/tutorial.

function fix(root_url, root_path=".")
    for (root, dirs, files) in walkdir(root_path)
        for file in files
            name, ext = splitext(file)
            if ext === ".html" && name ∉ ("404", "index")
                dir = joinpath(root, name)
                if !isdir(dir)
                mkdir(dir)
                url = "https://"*normpath(joinpath(root_url, root, name))
                open(joinpath(dir, "index.html"), "w") do io
                        write(io, """
                        <!DOCTYPE html>
                        <meta charset="utf-8">
                        <title>Redirecting to $url</title>
                        <meta http-equiv="refresh" content="0; URL=$url">
                        <link rel="canonical" href="$url">""")
                    end
                end
            end
        end
    end
end
fix("chairmarks.lilithhafner.com")
LilithHafner commented 4 months ago

I still think you should change the behavior to always write only tutorial/index.html because that allows the index.html to reference, e.g., a plot.png file in the same folder. So that gives you a lot more flexibility for processing non-trivial sources.

No that does not give any additional flexibility. Assets stored in the source directory like this:

docs
└── src
    ├── asset.png
    └── page.md

Build to this using Documenter:

_build
├── asset.png
└── page
    └── index.html

(e.g. https://github.com/JuliaDocs/Documenter.jl/blob/gh-pages/v1.3.0/man/hosting/walkthrough/index.html)

Building to a directory structure that matches the source directory structure will not introduce name conflicts.

_build_vitepress
├── asset.png
└── page.html

Plus, you're guaranteed that the "pretty" URLs work, independent of the server configuration.

Vitepress could (should?) add symlinks from /tutorial to /tutorial.html to also support servers that don't do that resolution automatically.


There's no substantive technical reason to prefer https://chairmarks.lilithhafner.com/v1.1.0/tutorial vs https://chairmarks.lilithhafner.com/v1.1.0/tutorial/ vs https://chairmarks.lilithhafner.com/v1.1.0/tutorial.html. I agree with Vitepress's style choice to use https://chairmarks.lilithhafner.com/v1.1.0/tutorial for the reasons I gave in https://github.com/JuliaDocs/Documenter.jl/issues/2473#issue-2169490365.

Github servers happen to redirect 404s at path/to/file to path/to/file/index.html and not the other way around, but that I don't think that is particularly important.

goerz commented 4 months ago

No that does not give any additional flexibility

I meant that in general. I routinely use setups where generated assets get put in the same folder as the index.html, e.g., on my website (which runs on a handwritten generator) and in QuantumControlExamples.jl (via Literate.jl)

Assets stored in the source directory like this:

docs
└── src
    ├── asset.png
    └── page.md

That would be a shared assets over multiple pages. I wouldn't mind if we added support in Documenter for a structure like

docs
└── src
    ├── shared_asset.png
    ├── page.md
    └── page
        └── local_asset.png

which then maybe could be referenced with something like ![text](@__DIR__/asset.png). Or maybe it's good enough to stick to

docs
└── src
    ├── shared_asset.png
    └── page
        ├── index.md
        └── local_asset.png

which already works (and which I'm using in QuantumControlExamples).

Vitepress could (should?) add symlinks from /tutorial to /tutorial.html

Does that work? Do webservers follow symlinks in this way, ignoring the file extension?

There's no substantive technical reason to prefer https://chairmarks.lilithhafner.com/v1.1.0/tutorial vs https://chairmarks.lilithhafner.com/v1.1.0/tutorial/

If https://github.com/LuxDL/DocumenterVitepress.jl/issues/64#issuecomment-1979655394 is correct

Not quite - without the trailing slash, it's up to the server. With the trailing slash, it auto resolves to $path/index.html.

then the difference is that "the URL for a folder load the index.html in that folder" is universal, whereas "the URL for a file without an extension loads that file with .html appended OR the equivalent folder, whichever is available" is not.

Unless the symlink solution works, it seems like DocumenterVitepress switching to folder/index.html is actually the only way (certainly the most robust way) to solve the issue "Trailing slash gives 404". This does not preclude a setting that then uses links like https://chairmarks.lilithhafner.com/v1.1.0/tutorial without a trailing slash in the sidebar etc. (if you can guarantee that the server hosting the docs supports that).

Documenter could have such an option as well, I'd be perfectly fine with that (but it shouldn't be the default, as it limits the servers that can host the docs). I have no problem with anyone preferring the non-slash URLs on an aesthetic basis.

Actually, it might be relevant to check if LiveServer.jl and python -m http.server can handle URLs without slashes. If not, that would make local preview quite difficult.

P.S.: Just tried LiverServer and python -m http.server and they both can handle forwarding https://chairmarks.lilithhafner.com/v1.1.0/tutorial to https://chairmarks.lilithhafner.com/v1.1.0/tutorial/, but not to https://chairmarks.lilithhafner.com/v1.1.0/tutorial.html. So Documenter would actually be fine if a "no-slash" option were to be added in combination with prettyurls=true (but not with prettyurls=false). DocumenterVitepress seems like it's pretty difficult to preview locally, as none of the recommended local servers implement its default URL scheme.

LilithHafner commented 4 months ago

Let's try to keep this issue focused on the fact that adding a trailing slash gives a 404 error in Vitepress.

If DocumenterVitepress's links are broken on some webservers (including local servers), I imagine that's something the authors of this package would love to hear about but I request you open a new issue for that.


This issue only effects folks transitioning from default Documenter.jl, and only effects them in the transition period while extant links still point to the old URLs. That said, many DoculmenterVitepress/Documenter.jl users will be transitioning from default Documenter.jl, and the transition period has an unbounded length.

@asinghvi17 suggested a solution here

I could manually generate a redirect page in tutorial/index.html, which goes to tutorial.html, which would probably work.

And I implemented and deployed it on Chairmarks here.

This is a hackey solution. Would you welcome a PR that adds a redirect_trailing_slash configuration option that can be true, false, or :auto (default) where auto detects if any previous builds using trailing slash links exist and if so (or if the option is true), runs the hack from here to add 200 redirects?

asinghvi17 commented 4 months ago

@goerz: I'm not sure what benefit that folder structure provides aside from avoiding namespacing issues? The asset wouldn't be loaded unless the page calls for it, in any case, and Vitepress tends to inline any included images as opposed to including them in the output. This doesn't seem to significantly impact load time (see https://beautiful.makie.org for an example).

@LilithHafner: Yes that would be great! It seems to work for you already :) but how would you do the detection? One could manually check the deployurl that's given in the settings I suppose, or check the gh-pages branch (which seems like it would be pretty slow...)

LilithHafner commented 4 months ago

how would you do the detection?

lol, idk. I'll think about that. I only really care about trailing slash handling on the deployed docs, and when deploying we need access to gh-pages anyway. OTOH, it's good (necessary) to build the exact same docs locally and hosted, because otherwise what is even the point of local builds?

Also, my current hack discards fragments. I'll look into that, too.

goerz commented 4 months ago

@goerz: I'm not sure what benefit that folder structure provides aside from avoiding namespacing issues?

It allows you to keep the exact URL scheme you have now (without the trailing slashes), but without requiring any hacks or server features. If you render tutorial.md into tutorial/index.html, then both https://chairmarks.lilithhafner.com/v1.1.0/tutorial and https://chairmarks.lilithhafner.com/v1.1.0/tutorial/ work with any server. Thus, it seems like the most elegant way of solving the "404 issue" this issue is about. Nothing would change compared to the current DocumenterVitepress experience or existing URLs: you can use the preferred URLs without the slash in the sidebar etc., but if someone follows an old URL from vanilla-Documenter with the slash, that'll also work without any kind of hack.

This issue only effects folks transitioning from default Documenter.jl,

That is a very good point! Even if someone implemented a PR for https://github.com/JuliaDocs/Documenter.jl/issues/2473 to make Documenter prefer URLs without a slash, that doesn't change existing pages, so that's going to be a problem for any project transitioning to DocumenterVitepress. The proposed workaround is to generate a structure like

⁞
├── tutorial
│   └── index.html
├── tutorial.html
⁞

where the index.html redirects to tutorial.html. That should work, but it feels quite ugly, and to the best of my understanding, having only tutorial/index.html would have the exact same effect without requiring any redirects.

I was also worried about whether the site can be previewed locally using LiveServer or python -m http.server. Strangely, that seemed to work for the most part when I just tested it just now (with DocumenterVitepress's current system). I don't quite understand why – in earlier testing it seemed like the local servers couldn't translate tutorial into tutorial.html. Maybe there's some JS magic in the background? Anyway, for whatever solution you end up with, making sure that it works for preview seems like an important consideration.

Vitepress tends to inline any included images as opposed to including them in the output. This doesn't seem to significantly impact load time (see https://beautiful.makie.org for an example).

Huh. I'm surprised it doesn't affect load time. I would have expected this to cause pretty huge .html documents that most browser would be less efficient at loading, and that also might hurt SEO. But this is a total tangent, though, and we should probably keep this thread more focused :-)

LilithHafner commented 4 months ago

It allows you to keep the exact URL scheme you have now

Almost, but the sites without trailing slashes 301 redirect to the with slash alternatives.

$ curl https://chairmarks.lilithhafner.com/v1.0.2/tutorial
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
$ curl curl https://chairmarks.lilithhafner.com/v1.1.0/tutorial
<!DOCTYPE html>
<html lang="en-US" dir="ltr">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Tutorial | Chairmarks.jl</title>
    <meta name="description" content="A VitePress Site">
...

(Chairmarks v1.0.2 and below uses Dcoumenter.jl without Vitepress and 1.1.0 and above uses Documenter.jl and DocumenterVitepress.jl)