JuliaDocs / LiveServer.jl

Simple development server with live-reload capability for Julia.
MIT License
143 stars 26 forks source link

n-th attempt at fixing 135 #150

Closed tlienart closed 2 years ago

tlienart commented 2 years ago

After #148 and its problems, I tried to fix this more systematically.

trying this out

all cases mentioned in #135 and #148 (including the CURL stuff) should work, feedback welcome.

Thanks!

closes #135

note on target correction

if a html page has href="/assets/foo.jpg or href="assets/foo.jpg" both should work properly. Now here's two scenarios with different behaviour:

  1. (servedocs) on a page man/ls+lit.md have ![](...), HTTP will set the req.target here to man/ls+lit/assets/foo.jpg with referer http://localhost:8000/man/ls+lit/
  2. (serve) on a page similar to the test page from the issue #135 in a folder lsbox have a href="assets/foo.jpg", HTTP will set the req.target to lsbox/assets/foo.jpg with referer http://localhost:8000/lsbox/

in the first case man/ls+lit/ should be removed from the target, in the second case lsbox/ should remain in the target.

it would be nice if we had access to the root thing before HTTP records a request target and tries to potentially resolve it, so for instance /assets/foo.jpg or assets/foo.jpg instead of (something from http)/assets/foo.jpg which we then have to either remove or keep, but this doesn't seem to be recorded or accessible. As a result, there is now some logic in serve_file which checks whether the requested target is an existing file, and if it isn't, tries to chop off the head of the path based on the referer to see if that is. If both fail, a 404 is returned.

mortenpi commented 2 years ago

This is awesome! I can confirm that all my complaints have been dealt with :laughing: All the redirecting seems to work, and also all the assets etc. seem to be served correctly.

I do have just a question about the whole Referer thing though. I am probably missing something, but as far as I know, the Referer header is purely informational. If LiveServer's behavior somehow were to depend on its value, that feels like a bug. In fact, I believe that if you follow hyperlinks the browser will set Referer to the page that contained the link, and so the header may well even from a different domain?

it would be nice if we had access to the root thing before HTTP records a request target and tries to potentially resolve it, so for instance /assets/foo.jpg or assets/foo.jpg instead of (something from http)/assets/foo.jpg which we then have to either remove or keep, but this doesn't seem to be recorded or accessible.

Do I understand correctly that, say, if you have an <img tag with a relative URL src="assets/foo.jpg", you would like HTTP.jl give you that relative URL, rather than the absolute one? The problem with that is the browser only sends absolute URLs to the server, and so HTTP.jl has no way of reverse engineering the relative one. Either way though, you can just resolve to the file based on the absolute URL provided by HTTP.jl, no?

mortenpi commented 2 years ago

Ahhh... I spoke too early maybe. Well, all previously reported problems are gone, but I did discover a new one: query strings are problematic.

Given this context:

$ curl -I http://localhost:8000/markdownast/reftest/build-html/search
HTTP/1.1 301 Moved Permanently
Location: /markdownast/reftest/build-html/search/
$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/
HTTP/1.1 200 OK
$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/index.html
HTTP/1.1 200 OK

This should also redirect (/markdownast/reftest/build-html/search?foo -> /markdownast/reftest/build-html/search/?foo), but instead it serves the HTML page (again, breaking relative URLs):

$ curl -I http://localhost:8000/markdownast/reftest/build-html/search?foo
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

And secondly, I guess due to the referer handling, you get spurious 404s:

$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/?foo
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/?foo -H "Referer: http://localhost:8000/markdownast/reftest/build-html/eval/"
HTTP/1.1 404 Not Found
$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/index.html?foo
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

$ curl -I http://localhost:8000/markdownast/reftest/build-html/search/index.html?foo -H "Referer: http://localhost:8000/markdownast/reftest/build-html/eval/"
HTTP/1.1 404 Not Found

The last example you should be able to repro in the browser (with the repro in https://github.com/tlienart/LiveServer.jl/pull/148#pullrequestreview-1099798575) by navigating to the /eval/ page, and then filling out the search form and pressing enter, which then does a GET to /search/?q=..., but also passes a Referer header.

tlienart commented 2 years ago

Thanks for the feedback and the additional examples; yes I'd much rather not have the Referer thing... the issue is that there's an ambiguity I can't seem to resolve. Sometimes HTTP gives me the right path, sometimes it prepends a path which breaks stuff. Unless we break everything and remove the ability to go from a dir listing to showing a website when the user clicks on a dir that is actually a "web-dir".

I'm not sure how to get around this. Also the logic that leads to req.target is opaque to me, and if it's completely on the browser side, I'm not sure how I can try to "fix" it on LiveServer side šŸ˜ž

tlienart commented 2 years ago

Either way though, you can just resolve to the file based on the absolute URL provided by HTTP.jl, no?

what were you thinking of here? the req object has a field req.target (which is what I sometimes "fix" in the current PR), it also has a bunch of other fields which are mostly useless like req.parent (always nothing as far as I could see), req.url (always empty as far as I could see) and a few fields like req["Host"] (always localhost:$port) and req["Referer"] (used in this PR).

mortenpi commented 2 years ago

Do you have an MWE where you get this ambiguity? I enabled debug = true and, at least from what I can see, the req in the Info: šŸÆ REQUEST (req: /markdownast/reftest/build-html/assets/themes/documenter-dark.css) šŸÆ messages is always an absolute ~URL~ path, as I'd expect.

mortenpi commented 2 years ago

what were you thinking of here?

Just that req.target, as far as I can tell, should always contain the absolute path of the request (the origin-form I think), and that should be sufficient to figure out which file the user is trying to fetch from the server (basically, just joinpath(CONTENT_DIR[], req.target) + stripping of the leading / and handing the trailing slash with redirect as needed).

tlienart commented 2 years ago

Case 1

Go to the liveserver directory then

using Pkg;Pkg.activate(".");using LiveServer;servedocs(debug=true)

and navigate to http://localhost:8000/man/ls+lit/ . There are two images here (respectively /assets/testlit.png and assets/testlit2.png). Reordering a bit the debug output:

ā”Œ Info: šŸÆ REQUEST (req: /assets/testlit.png) šŸÆ
ā””     req[referer]: http://localhost:8000/man/ls+lit/
ā”Œ Info: šŸ‘€ RESOLVE (req: /assets/testlit.png) šŸ‘€
ā”‚     fs_path:    docs/build/assets/testlit.png
ā””     v_fs_path:  docs/build/assets/testlit.png
ā”Œ Info: šŸ˜… POST RESOLVE (/assets/testlit.png)
ā””     fs_path: docs/build/assets/testlit.png []

===

ā”Œ Info: šŸÆ REQUEST (req: /man/ls+lit/assets/testlit2.png) šŸÆ
ā””     req[referer]: http://localhost:8000/man/ls+lit/
ā”Œ Info: šŸšØ REFERER SCRUB
ā””     /man/ls+lit/assets/testlit2.png --> assets/testlit2.png
ā”Œ Info: šŸ‘€ RESOLVE (req: assets/testlit2.png) šŸ‘€
ā”‚     fs_path:    docs/build/assets/testlit2.png
ā””     v_fs_path:  docs/build/assets/testlit2.png
ā”Œ Info: šŸ˜… POST RESOLVE (/man/ls+lit/assets/testlit2.png)
ā””     fs_path: docs/build/assets/testlit2.png []

observe the req.target 1 and 2

  1. /assets/testlit.png --> /assets/testlit.png ("correct")
  2. assets/testlit2.png --> /man/ls+lit/assets/testlit2.png (this is "incorrect" in the sense of LiveServer)

Case 2

create a folder lsbox with a structure like

.
ā”œā”€ā”€ a-sample.jpg
ā””ā”€ā”€ docs
    ā”œā”€ā”€ b-sample.jpg
    ā””ā”€ā”€ webdir
        ā”œā”€ā”€ assets
        ā”‚Ā Ā  ā””ā”€ā”€ sample2.jpg
        ā”œā”€ā”€ index.html
        ā””ā”€ā”€ sample.jpg

with index.html

<html>
    <body>
        <img src="/sample.jpg" />
        <img src="assets/sample2.jpg"</code>
    </body>
</html>

serve in the parent of lsbox and navigate to lsbox/docs/webdir

ā”Œ Info: šŸÆ REQUEST (req: /sample.jpg) šŸÆ
ā””     req[referer]: http://localhost:8000/lsbox/docs/webdir/
ā”Œ Info: šŸ‘€ RESOLVE (req: /sample.jpg) šŸ‘€
ā”‚     fs_path:    /Users/tlienart/Desktop/tjd/sample.jpg
ā””     v_fs_path:  
ā”Œ Info: šŸ˜… POST RESOLVE (/sample.jpg)
ā””     fs_path:  [āŒ āŒ āŒ ]

===

ā”Œ Info: šŸÆ REQUEST (req: /lsbox/docs/webdir/assets/sample2.jpg) šŸÆ
ā””     req[referer]: http://localhost:8000/lsbox/docs/webdir/
ā”Œ Info: šŸ‘€ RESOLVE (req: /lsbox/docs/webdir/assets/sample2.jpg) šŸ‘€
ā”‚     fs_path:    /Users/tlienart/Desktop/tjd/lsbox/docs/webdir/assets/sample2.jpg
ā””     v_fs_path:  /Users/tlienart/Desktop/tjd/lsbox/docs/webdir/assets/sample2.jpg
ā”Œ Info: šŸ˜… POST RESOLVE (/lsbox/docs/webdir/assets/sample2.jpg)
ā””     fs_path: /Users/tlienart/Desktop/tjd/lsbox/docs/webdir/assets/sample2.jpg []
  1. /sample.jpg --> /sample.jpg ("incorrect")
  2. assets/sample2.jpg --> /lsbox/docs/webdir/assets/sample2.jpg ("correct")
mortenpi commented 2 years ago

For Case 1, the corresponding HTML tags appear to be

  1. <img src="/assets/testlit.png" alt="">
  2. <img src="assets/testlit2.png" alt="">

which would lead to the URLs

  1. http://localhost:8001/assets/testlit.png
  2. http://localhost:8001/man/ls+lit/assets/testlit2.png

respectively. So req.target is exactly what I would expect. In turn, I would expect LiveServer to search for those files at

  1. /docs/build/assets/testlit.png
  2. /docs/build/man/ls+lit/assets/testlit.png

and so the second one should 404. On the other hand, the first absolute URL/path is bad too -- it works if served locally with the root at docs/build/, but would fail on GitHub Pages, since it would be referring to https://tlienart.github.io/assets/testlit.png and not https://tlienart.github.io/LiveServer.jl/dev/assets/testlit.png, since it is an absolute path.

I am leaving aside for now why those <img tags are what they are.. I think it's something to do with https://github.com/JuliaDocs/Documenter.jl/issues/1926, since I'm seeing those warnings.

mortenpi commented 2 years ago

And Case 2 is similar:

  1. <img src="/sample.jpg" /> -> absolute path -> http://localhost:8000/sample.jpg -> req.target = "/sample.jpg"
  2. <img src="assets/sample2.jpg"> -> path relative to /lsbox/docs/webdir/ -> http://localhost:8000/lsbox/docs/webdir/assets/sample2.jpg -> req.target = "/lsbox/docs/webdir/assets/sample2.jpg"
tlienart commented 2 years ago

Yes I guess I tried too hard to think about what the user may have tried to mean rather than what the specs actually say.

Alright, so I'll remove the referer stuff, 404 on assets/... if the root is unexpected (actually python -m http.server 404s there too, I should have just checked that).

I'll also look into the query string stuff, thanks for pointing that out

tlienart commented 2 years ago

Alright that should be done. I removed the Referer stuff, extended a little bit the redirect stuff so that it would handle query strings and anchors (foo/bar#hello --> foo/bar/#hello and foo/bar?hello --> foo/bar/?hello).

I think everything should work now but if you don't mind giving it a shot, your feedback has been super valuable for this (and I'll be glad when we merge this in).

mortenpi commented 2 years ago

Awesome! I gave it a spin and all the cases seem to work perfectly now!

and anchors (foo/bar#hello --> foo/bar/#hello

For what it's worth, I don't think browsers actually send anchors with the URL. The HTTP spec also only mentions the query string.

tlienart commented 2 years ago

Almost there then, thanks! I had looked at https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL but hadn't seen the bit where they say explicitly it's not sent to the server (it is on the page in the anchor section of course)

Thanks for your patience and glad things will be better soon!

tlienart commented 2 years ago

Alright cleaned this up using uri.path instead of home baked logic to find the query part. I think that should be cleaner. Seems everything works now so will merge and do a patch release. If there's still an issue with something, we'll do another round šŸ˜±

Thanks for your help @mortenpi !!

mortenpi commented 2 years ago

Wohoo! And I appreciate the time you put into this! And sticking with it despite me constantly finding new things to complain about :laughing: