getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
26.06k stars 4.35k forks source link

robots.txt #5824

Open AlfredTallMountain opened 2 years ago

AlfredTallMountain commented 2 years ago

Hi I'm using Redash but it's getting indexed by web spiders / search engines.

I see that there are a number of robots.txt files on the various dockers:

/var/lib/docker/overlay2/c4fa5ea26b53940eb4c2800023d9aea07d0bc763552edc079f562e863e853372/diff/app/client/app/assets/robots.txt /var/lib/docker/overlay2/497236ca5b107a3d49ed0663414a96cc60a2bb6180ddef08f6a70ba4400cd776/diff/app/client/dist/robots.txt /var/lib/docker/overlay2/4eadca46ea7297ed89a3f10af01a1958ad6e583ca10969e2bbe4746c75580207/merged/app/client/app/assets/robots.txt /var/lib/docker/overlay2/4eadca46ea7297ed89a3f10af01a1958ad6e583ca10969e2bbe4746c75580207/merged/app/client/dist/robots.txt /var/lib/docker/overlay2/5be3600d08c51c589854b48404f08f414508089f1e608f400af2807efa56de2b/merged/app/client/app/assets/robots.txt /var/lib/docker/overlay2/5be3600d08c51c589854b48404f08f414508089f1e608f400af2807efa56de2b/merged/app/client/dist/robots.txt /var/lib/docker/overlay2/4227ebff8b3c87620d0c7af22bb26a732f35872f23469f2509fc70a92bc3e6b8/merged/app/client/app/assets/robots.txt /var/lib/docker/overlay2/4227ebff8b3c87620d0c7af22bb26a732f35872f23469f2509fc70a92bc3e6b8/merged/app/client/dist/robots.txt /var/lib/docker/overlay2/57f521ef4585e88ebf225841ea6721297925182779a7c7e6b3419773543c2eba/merged/app/client/app/assets/robots.txt /var/lib/docker/overlay2/57f521ef4585e88ebf225841ea6721297925182779a7c7e6b3419773543c2eba/merged/app/client/dist/robots.txt /var/lib/docker/overlay2/ca122f621e5257ed43a8dab568ea1f74aa78bd44bbf68697a06b49f88afa52cb/diff/app/client/dist/robots.txt /var/lib/docker/overlay2/ca122f621e5257ed43a8dab568ea1f74aa78bd44bbf68697a06b49f88afa52cb/diff/app/client/app/assets/robots.txt

but when trying to access the file on the web (bi.mydomain.com/robots.txt) I get redirected to the login page.

How do I make robots.txt public / accessible and what's the best way to edit it, in order to have the contents:

User-agent: * Disallow: /

?

Thank you.

mt-rpranata commented 1 year ago

Its inside the /static/ path, I believe.

guidopetri commented 11 months ago

mt-rpranata is correct, it's served under your.domain.tld/static/robots.txt. You could configure your reverse proxy to forward robots.txt accesses to that path. However, there's no real content in that file, as you can see here.

I think this is probably something we should make more accessible, and maybe by default disallow any crawling. That being said, robots.txt is an entirely voluntary system, so if you want to definitely lock down your instance to crawlers you might have to consider deploying redash behind a VPN.