BookStackApp / BookStack

A platform to create documentation/wiki content built with PHP & Laravel
https://www.bookstackapp.com/
MIT License
14.67k stars 1.85k forks source link

BookStack indexing fails in Danswer - status 403: Forbidden #4669

Closed LkySlevin closed 9 months ago

LkySlevin commented 9 months ago

Attempted Debugging

Searched GitHub Issues

Describe the Scenario

@ssddanbrown First thing, thank you for BookStack and also for integrating it into Danswer. I am using BookStack for one year now and wanted to integrate AI, thus I want to connect it to Danswer. I already got a lot of help from the Danswer team, but I am not able to run the indexing of my BookStack wiki.

From the background task log I get:

11/05/2023 02:27:34 PM            update.py 400 : [Attempt ID: 130] Running indexing attempt for connector: 'BookStackConnector', with config: '{}', and with credentials: '10'
11/05/2023 02:27:34 PM            update.py 267 : [Attempt ID: 130] Polling for updates between 1970-01-01 00:00:00 and 2023-11-05 14:27:34
11/05/2023 02:27:34 PM            update.py 360 : [Attempt ID: 130] Failed connector elapsed time: 0.08685445785522461 seconds
11/05/2023 02:27:34 PM            update.py 424 : [Attempt ID: 130] Indexing job with ID '130' failed due to BookStack Client request failed with status 403: Forbidden
Traceback (most recent call last):
  File "/app/danswer/background/update.py", line 413, in _run_indexing_entrypoint
    _run_indexing(
  File "/app/danswer/background/update.py", line 376, in _run_indexing
    _index(db_session, index_attempt, doc_batch_generator, run_time)
  File "/app/danswer/background/update.py", line 374, in _index
    raise e
  File "/app/danswer/background/update.py", line 306, in _index
    for doc_batch in doc_batch_generator:
  File "/app/danswer/connectors/bookstack/connector.py", line 167, in poll_source
    doc_batch, num_results = self._get_doc_batch(
                             ^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/bookstack/connector.py", line 64, in _get_doc_batch
    batch = bookstack_client.get(endpoint, params=params).get("data", [])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/bookstack/client.py", line 41, in get
    raise BookStackClientRequestFailedError(response.status_code, error)
danswer.connectors.bookstack.client.BookStackClientRequestFailedError: BookStack Client request failed with status 403: Forbidden

I have created a user explicitly for being a dooropener to danswer and used the API tokens created. I also tried with an already existing admin user, both with the same result. I already checked with the Danswer team in their Slack, but they also think it might be on the BookStack side of things. Any idea where I could dig any further and how to proceed?

Exact BookStack Version

BookStack v23.01.1

Log Content

I dont think this error is related, because it is only once in my logs (and the only error within the past half year) and the error persits for every indexing run (each 10mins). Nevertheless I post it since it might give any help:

[2023-11-04 14:30:20] production.ERROR: You do not have permission to access the requested page. {"userId":21,"exception":"[object] (BookStack\\Exceptions\\NotifyException(code: 0): You do not have permission to access the requested page. at /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Controllers/Controller.php:57)
[stacktrace]
#0 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Controllers/Controller.php(66): BookStack\\Http\\Controllers\\Controller->showPermissionError()
#1 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Controllers/Controller.php(87): BookStack\\Http\\Controllers\\Controller->checkPermission('users-manage')
#2 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Controllers/Controller.php(99): BookStack\\Http\\Controllers\\Controller->checkPermissionOr('users-manage', Object(Closure))
#3 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Controllers/UserController.php(108): BookStack\\Http\\Controllers\\Controller->checkPermissionOrCurrentUser('users-manage', 6)
#4 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Controller.php(54): BookStack\\Http\\Controllers\\UserController->edit(6, Object(BookStack\\Auth\\Access\\SocialAuthService))
#5 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php(45): Illuminate\\Routing\\Controller->callAction('edit', Array)
#6 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Route.php(262): Illuminate\\Routing\\ControllerDispatcher->dispatch(Object(Illuminate\\Routing\\Route), Object(BookStack\\Http\\Controllers\\UserController), 'edit')
#7 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Route.php(205): Illuminate\\Routing\\Route->runController()
#8 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Router.php(721): Illuminate\\Routing\\Route->run()
#9 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(128): Illuminate\\Routing\\Router->Illuminate\\Routing\\{closure}(Object(BookStack\\Http\\Request))
#10 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/Authenticate.php(23): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#11 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\Authenticate->handle(Object(BookStack\\Http\\Request), Object(Closure))
#12 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/Localization.php(45): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#13 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\Localization->handle(Object(BookStack\\Http\\Request), Object(Closure))
#14 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/RunThemeActions.php(26): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#15 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\RunThemeActions->handle(Object(BookStack\\Http\\Request), Object(Closure))
#16 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/CheckEmailConfirmed.php(47): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#17 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\CheckEmailConfirmed->handle(Object(BookStack\\Http\\Request), Object(Closure))
#18 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/PreventAuthenticatedResponseCaching.php(21): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#19 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\PreventAuthenticatedResponseCaching->handle(Object(BookStack\\Http\\Request), Object(Closure))
#20 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/VerifyCsrfToken.php(78): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#21 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Foundation\\Http\\Middleware\\VerifyCsrfToken->handle(Object(BookStack\\Http\\Request), Object(Closure))
#22 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/View/Middleware/ShareErrorsFromSession.php(49): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#23 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\View\\Middleware\\ShareErrorsFromSession->handle(Object(BookStack\\Http\\Request), Object(Closure))
#24 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Session/Middleware/StartSession.php(121): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#25 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Session/Middleware/StartSession.php(64): Illuminate\\Session\\Middleware\\StartSession->handleStatefulRequest(Object(BookStack\\Http\\Request), Object(Illuminate\\Session\\Store), Object(Closure))
#26 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Session\\Middleware\\StartSession->handle(Object(BookStack\\Http\\Request), Object(Closure))
#27 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Cookie/Middleware/AddQueuedCookiesToResponse.php(37): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#28 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Cookie\\Middleware\\AddQueuedCookiesToResponse->handle(Object(BookStack\\Http\\Request), Object(Closure))
#29 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Cookie/Middleware/EncryptCookies.php(67): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#30 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Cookie\\Middleware\\EncryptCookies->handle(Object(BookStack\\Http\\Request), Object(Closure))
#31 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/ApplyCspRules.php(33): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#32 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\ApplyCspRules->handle(Object(BookStack\\Http\\Request), Object(Closure))
#33 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(103): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#34 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Router.php(723): Illuminate\\Pipeline\\Pipeline->then(Object(Closure))
#35 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Router.php(698): Illuminate\\Routing\\Router->runRouteWithinStack(Object(Illuminate\\Routing\\Route), Object(BookStack\\Http\\Request))
#36 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Router.php(662): Illuminate\\Routing\\Router->runRoute(Object(BookStack\\Http\\Request), Object(Illuminate\\Routing\\Route))
#37 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Routing/Router.php(651): Illuminate\\Routing\\Router->dispatchToRoute(Object(BookStack\\Http\\Request))
#38 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(167): Illuminate\\Routing\\Router->dispatch(Object(BookStack\\Http\\Request))
#39 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(128): Illuminate\\Foundation\\Http\\Kernel->Illuminate\\Foundation\\Http\\{closure}(Object(BookStack\\Http\\Request))
#40 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Http/Middleware/TrustProxies.php(39): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#41 /home/customer/www/thriving-green.com/public_html/bookstack/app/Http/Middleware/TrustProxies.php(41): Illuminate\\Http\\Middleware\\TrustProxies->handle(Object(BookStack\\Http\\Request), Object(Closure))
#42 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): BookStack\\Http\\Middleware\\TrustProxies->handle(Object(BookStack\\Http\\Request), Object(Closure))
#43 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TransformsRequest.php(21): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#44 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TrimStrings.php(40): Illuminate\\Foundation\\Http\\Middleware\\TransformsRequest->handle(Object(BookStack\\Http\\Request), Object(Closure))
#45 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Foundation\\Http\\Middleware\\TrimStrings->handle(Object(BookStack\\Http\\Request), Object(Closure))
#46 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/ValidatePostSize.php(27): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#47 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Foundation\\Http\\Middleware\\ValidatePostSize->handle(Object(BookStack\\Http\\Request), Object(Closure))
#48 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/PreventRequestsDuringMaintenance.php(86): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#49 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(167): Illuminate\\Foundation\\Http\\Middleware\\PreventRequestsDuringMaintenance->handle(Object(BookStack\\Http\\Request), Object(Closure))
#50 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(103): Illuminate\\Pipeline\\Pipeline->Illuminate\\Pipeline\\{closure}(Object(BookStack\\Http\\Request))
#51 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(142): Illuminate\\Pipeline\\Pipeline->then(Object(Closure))
#52 /home/customer/www/thriving-green.com/public_html/bookstack/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(111): Illuminate\\Foundation\\Http\\Kernel->sendRequestThroughRouter(Object(BookStack\\Http\\Request))
#53 /home/customer/www/thriving-green.com/public_html/bookstack/public/index.php(53): Illuminate\\Foundation\\Http\\Kernel->handle(Object(BookStack\\Http\\Request))
#54 {main}
"} 

Hosting Environment

ssddanbrown commented 9 months ago

Hi @LkySlevin,

For the created user, who's API credentials you're using here, do they belong to a BookStack role which has the "Access system API" role permission?

LkySlevin commented 9 months ago

Hi @ssddanbrown , Yes i did.

I followed the instructions from Danswer - Bookstack Connector Guide. So I created a new User called "DanswerUser" and I also created the role "Danswer". This role has the "Access system API" permission. I than created an API Token and entered it with the corresponding Secret in Danswer Admin Panel.

These are the user role's asset permissions: grafik

Maybe it has something to do with the base URL? But this is the one within the .env file and it is the page visible after login. https://thriving-green.com/bookstack/public/

No idea, what could be the issue...

ssddanbrown commented 9 months ago

It could likely be due to the base URL, or how your setup is handling URLs in general. You could try removing the trailing slash of the base URL in danswer. Also, try going to https://thriving-green.com/bookstack/public/api/docs.jsonin the browser while logged in as an API-allowed user to see if that endpoint works and returns JSON.

Having /public/ in the URL like that indicates a sketchy setup though, likely with workarounds or edits at play. You should really never need to have /public/ be part of the URL if setup properly.

LkySlevin commented 9 months ago

Also, try going to https://thriving-green.com/bookstack/public/api/docs.jsonin the browser while logged in as an API allowed user to see if that endpoint works and returns JSON.

That actually works fine. I can see a json.

It could likely be due to the base URL, or how your setup is handling URLs in general. You could try removing the trailing slash of the base URL in danswer.

Does not change anything. Having /public/ in the URL like that indicates a sketchy setup though, likely with workarounds or edits at play. You should really never need to have /public/ be part of the URL if setup properly.

Well, it has been a while and it was the first time for me to set anything up like this using apache and php whatsoever. Took me quite some time also to SSH into siteground. So, I would call my self a real freshman.

I think I remember playing around with the base URL until it worked but my memory could trick me here. If you really think that could be the issue, then I would appreciate if you could help me put it in the right place.

ssddanbrown commented 9 months ago

Since the JSON endpoint worked, we'll continue checking on API usage for now, can circle back to that after but not sure it'd be the issue since you can access the docs endpoint.

Next up is to validate the token and key works externally. From a terminal window, or powershell window if on Windows, run:

curl --request GET \
  --url https://thriving-green.com/bookstack/public/api/books \
  --header 'Authorization: Token abc123:def456'

But replace abc123 with the BookStack token ID, and def456 with the BookStack token secret. What do you get in response?

LkySlevin commented 9 months ago

Alright, I am quite new to curl requests and I tryed the following.

curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - Die Sperrfunktion konnte keine Sperrprüfung für das Zertifikat durchführen.


Using `^` and `""` I think is the way to go on windows I guess, resulting in an automatic newline within the powershell asking for `more?`. The final part of the response translates to `The revocation function could not perform a revocation check for the certificate.`

So I am not sure if I performed the `curl` command correctly, but I assume the last attempt was correct.

I really appreciate your work on bookstack and your help here!
LkySlevin commented 9 months ago

Not sure if it helps but ChatGPT suggested to try:

curl --request GET ^
Mehr? --url "https://thriving-green.com/bookstack/public/api/books" ^
Mehr? --header "Authorization: Token Dj1DII3CYNKSW1Cpkgd:R7vYlutWhTjGnE" ^
Mehr? --insecure
{"error":{"code":401,"message":"No matching API token was found for the provided authorization token"}}

Again I shortened the tokens here.

EDIT: I tried with new tokens from my admin user from bookstack, which has all permissions.

Without the --insecure option I also get the curl (35) error. However, using --insecure I get a proper response

{"data":[{"id":1,"name":"IT","slug":"it","description":"This book contains all information (e.g. tutorials or credentials) regarding IT.","created_at":"2023-02-24T18:25:21.000000Z","updated_at":"2023-02-24T18:25:21.000000Z","created_by":1,"updated_by":1,"owned_by":1},{"id":2,"name":"Technology","slug":"technology","description":"Includes documentation of the sensor software and hardware, as well as feature guides, how-tos and best practices.","created_at":"2023-02-24T20:50:01.000000Z","updated_at":"2023-07-06T07:40:14.000000Z","created_by":1,"updated_by":6,"owned_by":1},{"id":3,"name":"Finance","slug":"finance","description":"","created_at":"2023-02-27T12:29:26.000000Z","updated_at":"2023-02-27T12:29:26.000000Z","created_by":7,"updated_by":7,"owned_by":7},{"id":4,"name":"Board","slug":"board","description":"","created_at":"2023-02-28T07:53:10.000000Z","updated_at":"2023-02-28T07:53:10.000000Z","created_by":6,"updated_by":6,"owned_by":6},{"id":5,"name":"Marketing","slug":"marketing","description":"This book contains everything about marketing...
ssddanbrown commented 9 months ago

Okay, so that last attempt is working okay. Does danswer work if you use those same (known working) details?

If you get the same error, next thing I'd suggest checking is the webserver error/access logs to see if they provide any clues. If you're using some kind of management layer/system, you might have to refer to their docs in where to find those. Thinking it could be some security/access controls set on the site?

LkySlevin commented 9 months ago

I tried with these credentials one more time (with and without trailing /) but with no luck. Regarding my setup, our website thriving-green.com is hosted at siteground. That was done years ago by a colleague, who is not part of the organization anymore. When I installed BookStack, I simply installed it here: grafik That is why public is part of the base URL I guess.

In the logfiles folder I checked the latest .gz file but did not find any clues - though you might be knowing what to look for.

If you're using some kind of management layer/system, you might have to refer to their docs in where to find those. Thinking it could be some security/access controls set on the site?

AFAIK I did not setup any particular layer/system or security/access controls in BookStack or the website setup. We are simply an NGO using BookStack as a wiki.

Would you suggest me, to move the Bookstack folder outside of the website? Can this be done without the risk of loosing our current conten? What would be the base url then?

Thanks for your support!

LkySlevin commented 9 months ago

Besides, have a look at my .env's content (I removed credentials)

# Application key
# Used for encryption where needed.
# Run `php artisan key:generate` to generate a valid key.
APP_KEY=base64:HBPORJ1OXXXX

# Application URL
# This must be the root URL that you want to host BookStack on.
# All URLs in BookStack will be generated using this value
# to ensure URLs generated are consistent and secure.
# If you change this in the future you may need to run a command
# to update stored URLs in the database. Command example:
# php artisan bookstack:update-url https://old.example.com https://new.example.com
APP_URL=https://thriving-green.com/bookstack/public

# Database details
DB_HOST=localhost
DB_DATABASE=dbXXX
DB_USERNAME=ujmXXX
DB_PASSWORD=XXX

# Mail system to use
# Can be 'smtp' or 'sendmail'
MAIL_DRIVER=smtp

# Mail sender details
MAIL_FROM_NAME=BookStack
MAIL_FROM=sw@thriving-green.com

# SMTP mail options
# These settings can be checked using the "Send a Test Email"
# feature found in the "Settings > Maintenance" area of the system.
MAIL_HOST=mail.thriving-green.com
MAIL_PORT=XXX
MAIL_USERNAME=sw@thriving-green.com
MAIL_PASSWORD=XXX
MAIL_ENCRYPTION=ssl
ssddanbrown commented 9 months ago

Would you suggest me, to move the Bookstack folder outside of the website? Can this be done without the risk of loosing our current conten? What would be the base url then?

It really depends on what options you have in (what I assume to be) your management system (siteground). It gets a bit more complex since you're wanting to serve on a sub-path too. It's probably gonna take a lot more time to understand what hosting options you have, and walk through the process step-by-step. I have some guidance here for a sub-directory setup, but it assumes web-server access. You may be limited by your hosting system.

I'm not sure it's the cause of the danswer issues though though, since you can connect to the API directly.

AFAIK I did not setup any particular layer/system or security/access controls in BookStack or the website setup.

Can you see error logs when following this guidance?

LkySlevin commented 9 months ago

Well according to the error logs - there are no errors at all :D grafik So that is a dead end. But what I can see is that we actually have different domains. The wiki-domain containt an old wiki used several years before. I think I will create a subdomain with bookstack and adapt the base URL. Do you think this is a good approach?

Regarding API - dont you think it is strange that I can only get access with the admin account and not the others? And do you know why it only works with "insecure" settings?

ssddanbrown commented 9 months ago

I think I will create a subdomain with bookstack and adapt the base URL. Do you think this is a good approach?

That is usually the easiest approach. Ideally you'd need to be able to set your web root so you're only exposing the public folder, but I'm not sure SiteGround provide this, since they document this workaround. Also, ideally you'd have command-line access to properly manage a BookStack instance. I advise against using BookStack in environments where this is not possible otherwise you can't properly manage the instance.

Before attempting anything, make sure you have good backups though.

Regarding API - dont you think it is strange that I can only get access with the admin account and not the others? And do you know why it only works with "insecure" settings?

It is but I'm not sure it's connected with the danswer issue, since you're specifically getting a 403 response there, rather than a connection error. The error you got with the non-admin user is quite specific to a non-matching api token scenario. Not sure how you'd see that error message without the token ID being wrong or badly formatted somehow.

The --insecure flag ignores issues with verification when attempting to make a https:// connection. You have a valid public cert on the website, but this can be thrown if there are issues from the client system where you're running the command (or anything in the middle like proxies, especially if using a company machine/internet).

It's still not clear why you're getting 403 errors from danswer, but siteground is responding with that exact 403 - Forbidden when certain paths are accessed, which makes me think a connection is being made but something's off at either the siteground side or BookStack side, or maybe the handling of URLs (could be customizations to make it serve on public). The .htaccess of the BookStack public folder, and each .htaccess in the folders above, could be affecting things.

Shame there's nothing in the error logs, that seems wrong to be honest. You can try looking instead at the access logs, should be in the same kind of place. Just to see what they show when danswer attempts a connection.

LkySlevin commented 9 months ago

Thank you for the elaborate answer.

Also, ideally you'd have command-line access to properly manage a BookStack instance. I advise against using BookStack in environments where this is not possible otherwise you can't properly manage the instance.

Correct me if I am misunderstanding you here, but I am able to SSH into siteground, that is how I set up BookStack in the first place. Although it took me quite some time to find a proper tutorial and get in.

So as there is no other solution in sight, I will try to move BookStack. My plan is actually to leave the current state (whole bookstack folder) where it is at the moment as my backup and also copy it to the new subdomain and update the base URL as described in .evn using SSH.

I will report back what happened.

ssddanbrown commented 9 months ago

Correct me if I am misunderstanding you here, but I am able to SSH into siteground, that is how I set up BookStack in the first place. Although it took me quite some time to find a proper tutorial and get in.

Ah, okay, that should be fine then. Was just worried since a lot of these managed-hosting systems don't provide access.

At some point, when the new version is active, you'll have to update the URLs in the database, for which we have a command for. You'll also have to update the APP_URL in the new setup .env file. Before running that command, you'll want to backup the database where possible, since the database will probably still be shared with the original instance (unless you've exported and re-imported into a different database entry or something).

LkySlevin commented 9 months ago

So I did the following

  1. Created the subdomain bookstack.thriving-green.com
  2. Moved the content from my bookstack folder into bookstack.thriving-green.com
  3. SSHed into bookstack.thriving-green.com and ran the php cmd for updating the URL php artisan bookstack:update-url https://thriving-green.com/bookstack/public https://bookstack.thriving-green.com
  4. Clicked on the updated base URL in the .env file but did not work but also showed me the 403 forbidden error.
  5. Moved the content into bookstack.thriving-green.com->public_html (I contacted siteground support and they said if I want to make it visible, I should place all my stuff there)
  6. Ran the update base URL cmd again and now, with the base URL https://bookstack.thriving-green.com/public I can log into bookstack again.

However, I tried Danswer and it failed again with the same message and since I have to add the public we are at the same place as before I would say.

I also checked the access logs from Siteground. Unfortunately I dont see anything happening when I run an index attempt with danswer in the logs. I only see the following when login in and accessing a page:

185.119. thriving-green.com - [22/Nov/2023:20:46:06 +0000] "GET /bookstack/public/books/operations/chapter/partner-locations HTTP/2.0" 200 7823 "https://thriving-green.com/bookstack/public/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | 0.150 0.158 0.159 MISS 0 NC:000000 UP:SKIP_CACHE_SET_COOKIE
‮
185.119. thriving-green.com - [22/Nov/2023:20:46:06 +0000] "POST /bookstack/public/login HTTP/2.0" 302 590 "https://thriving-green.com/bookstack/public/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | 0.189 0.201 0.200 - 0 NC:000000 UP:SKIP_CACHE_SET_COOKIEDT
‮
185.119 thriving-green.com - [22/Nov/2023:20:46:02 +0000] "GET /bookstack/public/uploads/images/system/2023-02/NAhhqAHaf8D52BuW-tg-logo-ohneschriftzug-fb.png HTTP/2.0" 200 1083 "https://thriving-green.com/bookstack/public/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | - - 0.000 - 0 NC:000000 UP:-DT
‮
185.119 thriving-green.com - [22/Nov/2023:20:46:02 +0000] "GET /bookstack/public/uploads/images/system/2023-02/PjFANRjS8IDdqflt-tg-logo-ohneschriftzug-fb.png HTTP/2.0" 200 11085 "https://thriving-green.com/bookstack/public/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | - - 0.000 - 0 NC:000000 UP:-DT
‮
185.119 thriving-green.com - [22/Nov/2023:20:46:02 +0000] "GET /bookstack/public/dist/print-styles.css?version=v23.01.1 HTTP/2.0" 200 618 "https://thriving-green.com/bookstack/public/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | - - 0.000 - 0 NC:000000 UP:-DT
‮
185.119 thriving-green.com - [22/Nov/2023:20:46:02 +0000] "GET /bookstack/public/login HTTP/2.0" 200 2340 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | 0.104 0.112 0.112 MISS 0 NC:000000 UP:SKIP_CACHE_SET_COOKIE
‮
185.119 thriving-green.com - [22/Nov/2023:20:46:01 +0000] "GET /bookstack/public/books/operations/chapter/partner-locations HTTP/2.0" 302 442 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0" | TLSv1.3 | 0.099 0.106 0.107 MISS 0 NC:000000 UP:SKIP_CACHE_SET_COOKIE

(I altered the IP)

So I dont know if the access logs have any value for you and besides. Do you think that the workaround with the .htaccess will bring any benefit? I assume since it is only redirecting it might not change the situation.

ssddanbrown commented 9 months ago

Alright, just done some more testing. Think I've got an idea of the cause. This felt like something being blocked at host/webserver level (Siteground or the web-server they're running) since I'd see similar 403 - Forbidden responses when hitting certain endpoints.

From experience, I know some systems that attempt to do active security blocking can be unfriendly to default or empty user-agents (How browsers identify themselves to servers, but it's totally messy and most lie anyway).

Playing around with this via CURL, i found this is in play and can block requests that are using the default python-requests user agent (which is the library used by Danswer to make requests). As an example:

### Request with python-requests user-agent
curl --request GET \
  --url https://bookstack.thriving-green.com/public/api/books \
  --header 'Authorization: Token abc123:def456' \
  -A 'python-requests/2.31.0'

### Output
403 - Forbidden | Access to this page is forbidden.

### Request with random other user-agent
curl --request GET \
  --url https://bookstack.thriving-green.com/public/api/books \
  --header 'Authorization: Token abc123:def456' \
  -A 'cat'

### Output
{"error":{"code":401,"message":"No matching API token was found for the provided authorization token"}}%

The first example is completely being blocked at the Siteground/web-server level, so is not reaching BookStack. The second is reaching BookStack but just then failing API auth (expected since I'm using made-up invalid token values).

Based on these tests, this is likely the cause. If you don't have security controls for this (or anything related to User-Agent) then it'd be worth contacting siteground if possible to see if that can alter this rule. Could alternatively ask for the user-agent to be changed on the danswer side, but I don't think they should have to make changes just to work with rules used by Siteground, best to do this Siteground side if possible.

LkySlevin commented 9 months ago

Finally the issue is resolved. I contacted SiteGround support and they confirmed:

Indeed the request is matching one of our security rules - this is was due to the generic user-agent used - "python-requests". If the user agent was something custom and created specially by the software/script developers then the rule would not be hit. That being said, you can contact the developers or just change the user-agent used yourself in the requests and ensure that the user-agent is distinct and serves the purpose to differentiate the requests as unique instead of using the name of the python library - requests.

So I checked the Danswer repo and found under backend -> danswer -> connectors -> bookstack -> client.py

    def _build_headers(self) -> dict[str, str]:
        auth = "Token " + self.token_id + ":" + self.token_secret
        return {
            "Authorization": auth,
            "Accept": "application/json",
            "User-Agent": "cat" <-- added by me
        }

I rebuild Danswer and retried with the new user-agent and it worked! grafik

Thank you very much for your support, without you I would not have made it ;)

ssddanbrown commented 9 months ago

Awesome news! Good to see things working for you!