Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
https://anythingllm.com
MIT License
19.16k stars 2.1k forks source link

[BUG]: Confluence crawler incorrectly does not accept subdomains #1772

Open Matti-Koopa opened 2 months ago

Matti-Koopa commented 2 months ago

How are you running AnythingLLM?

Docker (local)

What happened?

Subdomains seem to be only supported for normal TLDs. Something like https://confluence.myorg.at/display/SPACE/* works. Something like https://confluence.myorg.gv.at/display/SPACE/* doesn't work. Apparently it uses regular expression to check if an URL is valid, but it's hardcoded to only check for a single or no subdomain at all.

Are there known steps to reproduce?

Click Data Connectors, click Confluence, enter the URL, click Submit, get error

JayCroghan commented 1 month ago

I'm assuming this is also why it doesn't work when there is a port in the URL??

Mine is http://DOMAIN.net:8090/display/PRJ and it tells me it's not in the correct format.

derkoe commented 2 weeks ago

And you can also run confluence under a context path - which also does not work. For example:

https://my.domain.com/confluence/display/myspace/*