eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.59k stars 120 forks source link

Docker + reverse proxy with subdirectory #1193

Open thehijacker opened 2 years ago

thehijacker commented 2 years ago

Hello,

Spent way too much this on trying to figure this out. I hope someone wanted to do the same and managed to do it.

Using docker-compose file and everything is loaded and reachable at "http://192.168.28.53:7880/app". Now I wish to open access to public using nginx reverse proxy on which I also have my SSL certificate. Using following rule:

    location /docspell/
    {
        client_max_body_size 100M;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forward-Proto http;
        proxy_pass http://192.168.28.53:7880;
    }

When I try to open: https://domain.com/docspell/ I get a "Not found". Looking at logs I can see it tried to open /docspell and got 404 back. Where to change base url to include /docspell for all the URL calls? I looked in files under /var/solr but can not find anything there.

Thank you.

eikek commented 2 years ago

Hi @thehijacker , your nginx config probably lets nginx forward the request "as is" to the upstream server. It then also receives the path /docspell, but the server doesn't know about it. This is related to the docspell restserver, as this is opened up. Solr is not affected here at all, it is only used internally.

Now, I have never tried to deploy docspell behind a path, so this might not really work. You can try this by first telling nginx to strip the path when forwarding requests (see the nginx docs for this, I remember adding a slash to the upstream server is enough, not sure though!) and then set the base-url to your public url including the path, maybe https://my.server.com/docspell.

If possible, I would recommend to use subdomains instead. The docs have an example for this (you probably already saw this).

thehijacker commented 2 years ago

Hello @eikek. Can not use subdomain as my SSL certificate is not wildcard.

I tried with or without slash for the upstream. Did not help. I have high hopes with changing base-url. Just do not know where to set it :). As I said I am using default docker-compose.yml file. I just changed the passwords inside it.

If you can point me to the right configuration file I can change it test if then it would work.

eikek commented 2 years ago

If you use the docker-compose file you can use environment variables, look here for possible options and also the page has some good information about how to configure docspell. You can use a config file or env variables. The env variable we need now is DOCSPELL_SERVER_BASE__URL. Setting this should take care of generating correct links in http response contents. But nginx must still strip the path; I thought if you specify a path on the upstream server (it is just / here), nginx rebases the request path. But maybe you need to do some path rewrites.

thehijacker commented 2 years ago

I actually tried already env DOCSPELL_SERVER_BASE__URL but again it was not working with nginx.

Now I am trying something else. For base url I have put http://192.168.28.53:7880/docspell but again it is not working. I was hoping that this would change all internal URL calls to /docspell/api for example but it did not.

Looking at nginx access log I can see that using domain.com/docspell does proxy to http://192.168.28.53:7880/ but the next URL that it tries to open is domain.com/api and not domain.com/docspell/api.

Now I am not sure if this is something docspell should handle or nginx.

If this will never work I do not mind access it over VPN and with my local IP address. I actually tried this already but sending documents over Android client application fails to send as the url is http and not https? Actual error in application is:

CLEARTEXT communication to 192.168.28.53 not permitted by network security policy.

It looks like this is related to this bug:

https://github.com/docspell/android-client/issues/7

Or I am doing again something wrong in configuration?

eikek commented 2 years ago

Ah ok, I guess it then doesn't work behind a path :( sorry. I'm also not at all a nginx expert. We can have a ticket for this, but it might be a while until I can work on that. For a quick check if baseurl setting is active, you can right click and "view page source". There will be a json structure, where the baseurl should also be present. the client should actually take this url into account, but as I said, I never had a path in mind for now (so it is "officially" not supported).

The android app problem is exactly the issue you mentioned and really unfortunate. But at least there is a workaround: you can install the previous version. The new version doesn't have new features other than supporting self signed certificates (which somehow destroyed plain text connections :/).

thehijacker commented 2 years ago

Indeed. Old 0.4.0 version works fine. It sent the image from Open Note Scanner to docspell and it added it as document. Sadly it did not do OCR on the image. And this are next steps I need to do. Figure out how to automatically adds tags based on OCR text from document and make it process (OCR) also the images files :). Time to read the documentation from start to end.

We can leave this ticket opened if you every find time to work on base-url with subfolder feature. I am comming from paperless-ng and so far liking docspell more. It has much more features, just needs more time to figure them all out.

eikek commented 2 years ago

Sadly it did not do OCR on the image.

Oh really? This is not expected, it should definitely do OCR on the image. I just tried it here where it works :) You can open another issue with some logs if you want. In the logs there should be a tesseract command somewhere. You should also be able to see the extracted text in the ui (the menu on the attachment has a "view extracted data" entry)

If anything is not clear with the docs, don't hesitate to ask :-)

thehijacker commented 2 years ago

Nothing in logs /data/logs with tesseract word inside. As I said. I need to start reading documentations. Doing something wrong for sure.

gandy92 commented 2 years ago

I also have the problem that my SSL certificate is not wildcard and with my dyndns provider not offering the required control I won't be able to change this any time soon. I did a quick grep over the sources and noticed that while several places use the base_url to construct new hrefs, other don't. This includes setting up the Router in RestServer.scala, where the path section of the base_url could be prepended to the installed routes. Would this help in solving the issue? I've successfully set up the build environment for docspell and although I don't have any experience in scala or elm, I'd like to give it a shot - if you're not already working on it, of course.

eikek commented 2 years ago

@gandy92 I'm not working on this. You can go ahead, if you want! thanks! I would also start on the Router - not sure if that is the only place though. I think one other place to look for are the background tasks that send emails/notification messages. They get the base-url from the client, so it might just work actually 🪄 but maybe not 😉

gandy92 commented 2 years ago

@eikek thanks for your feedback! I've already managed to change RestServer,scala, AttachementRoutes.scala, ShareAttachementRoutes.scala, Flags.scala and TemplateRoutes.scala to prepend cfg.baseUrl.path.asString to each location. The restserver compiles and starts ok, but of course this is only part of the game. I'll need some time to wrap my head around the elm code, so in order to find all relevant spots I've started hard-coding the base_path where I thought to need it (the base_path is chosen so that it is perfectly identifiable in the code so that it can easily be replaced with a variable at a later point). So far I've changed App/Update.elm, Comp/ItemCard.elm, Page.elm and also template/index.html. Now the main page loads and looks like it should, but it keeps reloading and I get quite a lot of errors like this:

restserver level=INFO  thread=blaze-selector-0 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41858"
restserver level=INFO  thread=io-compute-2 logger=d.r.w.TemplateRoutes message="Compiled template file:/home/andy/prg/docspell/modules/restserver/target/scala-2.13/classes/index.html"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/app/login?r=/andy/app/home"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/app/assets/docspell-webapp/0.31.0-SNAPSHOT/img/logo-96.png"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"
restserver level=INFO  thread=blaze-selector-3 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41864"
restserver level=INFO  thread=blaze-selector-4 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41866"
restserver level=INFO  thread=blaze-selector-1 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41860"
restserver level=INFO  thread=blaze-selector-2 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41862"
restserver level=DEBUG thread=io-compute-2 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=DEBUG thread=io-compute-1 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=blaze-selector-0 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41868"
restserver level=DEBUG thread=io-compute-0 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/auth/session"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/email/settings/smtp?q="
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/clientSettings/webClient"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/calevent/check"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/calevent/check"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-1 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/tag?sort=name&q="
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-0 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/tag?sort=name&q="
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/folder?q=&sort=name"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/info/version"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"

I also see several complaints on the javascript console, but with the page reloading all the time it's difficult to get a clear picture.

So, way to go here and I will most probably have to pick your brain at some point. Not to raise any expectations, at the moment I'm mostly curious if I can come up with a solution that "only" requires a few optimizations on your side.

eikek commented 2 years ago

It is not an easy change, I'm afraid - I knew this 😄 But I was hoping that it's not so many places. This is really bad. I also like the idea to hard-code all the places for now and see how to streamline it later. It should be possible with a few such modifications that you did, i think.

Re Elm: There is already a config setting for base_url at the server and this is also send to the client (I think you found it). In Elm, there is Flags.elm file that contains this base url. My thought was that if the base_url is communicated with the path to the client, there shouldn't be too much to change. It might be necessary to pass this Flags type to more places, though.

Re reloading all the time: I'm not sure why that is from immediate memory. It could be related to some requests that cannot be authenticated properly. Maybe the cookie is not picked up, because its path changed? Just a very rough guess. It seems also strange that it says "invalid authenticator", because that means that the token is send with the request (it's there), but could not be decoded. If you want you can push your code somewhere so I can check it out and run it here (when i find time).

gandy92 commented 2 years ago

Neither would I have expected it to be easy, especially given my lack of experience with elm and scala. However, it's probably not that may places, after all - some changes I already had to revert to not end up adding the base_path twice. Anyway, I've pushed my changes to my fork of docspell at https://github.com/gandy92/docspell. As far as I can tell, most URLs during page loading are fine, but as you already noticed the authentication stuff is utterly broken. In the webgui this leads to the login page not being shown at all (I've tested this with all cookies removed and a cleared browser cache). I used a simple python script to test logging in over the REST API, and this works fine, including retrieval and use of the access token. So it could well be that the problem is mostly cookie related, but I couldn't find where to look further on this one.

eikek commented 2 years ago

Awesome, thank you! I'll look into it in the next days (I hope)

eikek commented 2 years ago

Hi @gandy92 , I tried your branch and did some changes. Now it kind of works. The reason for re-authentication was because the parser for the pages was not updated with the new basePath. It is still a mess, of course. Not sure how to streamline it right now, maybe you find something here. If you want I can push my changes to your branch if you would like to further investigate, I think you need to open a PR for me to do this. You can also get it from here.

eikek commented 2 years ago

Hi @gandy92 I just pushed something your branch

gandy92 commented 2 years ago

Thank you @eikek I'll look into it as soon as possible. Back at the day job, but I'll find time.

eikek commented 2 years ago

Sure, and no worries, we have no deadlines here :) Whenever you find some time.

gerroon commented 2 years ago

Hi

Is subdirectory now allowed behind a reverse proxy? I need to set this up with Apache but it directs to "/app" so I am not sure how to fix it.

eikek commented 2 years ago

It's not possible to deploy behind a path, must be the root path at the moment. There were some efforts to this, but no eta.

gerroon commented 2 years ago

It's not possible to deploy behind a path, must be the root path at the moment. There were some efforts to this, but no eta.

Thanks, I will just try to use it internally.