digitalmethodsinitiative / 4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Other
242 stars 58 forks source link

Allow autologin to _always_ work (or perhaps disable login?) #272

Closed anderscollstrup closed 2 years ago

anderscollstrup commented 2 years ago

I am running a 4cat server in docker, with a apache2 reverse proxy in front. It works fine except for one small thing.

MYSERVER.domain host my apache proxy.

In settings -> Flask settings I have: Auto-login name = MYSERVER.domain

However when i access through the proxy don't want to meet a login to 4cat. I just want to be inside. I was thinking that Auto-login name would whitelist hosts so they could bypass login?

stijn-uva commented 2 years ago

'Auto-login name' is simply the name that is displayed in the interface as your account name when using the auto-login feature. Instead, you want to change 'White-list for API' and 'White-listed hostnames'. These should contain a JSON list of IP addresses or host names that bypass login (and will see themselves as being logged in as 'Auto-login name'). Example value:

["localhost", "*.uva.nl"]
stijn-uva commented 2 years ago

There is a related issue you might encounter after this (#274). We're aware of that one and will try to address it in the near future.

anderscollstrup commented 2 years ago

Then the feature does not work .... because I already have

White-list for API = ["localhost", "MYSERVER.domain"] White-listed hostnames = ["localhost", "MYSERVER.domain"]

anderscollstrup commented 2 years ago

Did I do something wrong or is there a bug in 4cat?

stijn-uva commented 2 years ago

Hi @anderscollstrup , please allow some time for us to look into this. We are aware of the issue and will respond when we know more. 4CAT maintenance is not our only responsibility so we cannot always offer an immediate response.

anderscollstrup commented 2 years ago

Sorry, I just felt forgotten ;-) , I will let you work. In my university we appreciate the effort you put in 4cat and tcat :-)

dale-wahl commented 2 years ago

@anderscollstrup I've tested the whitelist and it works on our Docker versions. I managed to whitelist every host in all of the Netherlands screwing with it. It takes the originating IP of the request and checks it's host. You could try sending a request directly from a computer/server at "MYSERVER.domain" and see if it auto logs in. My guess is that your Apache config is forwarding the originating requester's IP to 4CAT and 4CAT is checking that (instead of the Apache host). You'd want Apache to send it's IP (assuming it's on the whitelisted host). I am not super sure how to accomplish that in an Apache config, but I am guessing that's where the issue lies.

I am assuming you checked the 4CAT control panel in the web interface to verify those variables as well. You may also want to restart the 4cat_frontend container to ensure changes to the whitelist take effect (docker stop 4cat_frontend then docker start 4cat_frontend). I'm not 100% sure that is necessary in this case though.

anderscollstrup commented 2 years ago

Thanks @dale-wahl :-) That was the hint I needed. The access_gunicorn.log shows an internal docker IP that can't resolve to the hostname of my apache proxy. So I simply put the IP address in

White-list for API White-listed hostnames Auto-login name

Then it worked, also after reboot. (i did not know if docker would change IP)

anderscollstrup commented 2 years ago

I still have problems with auto login

When I use a different browser it does not work. Nor does it work for other users of the server

My settings are:

White-list for API = ["localhost", "*.domain.com", "172.21.0.1", "*"] White-listed hostnames = ["localhost", "*.domain.com", "172.21.0.1", "*"] Auto-login name = ["localhost", "*.domain.com", "172.21.0.1", "*"]

172.21.0.1 is the internal docker address that my apache reverse proxy comes from.

How do I allow anyone to use autologin?

dale-wahl commented 2 years ago

You can add ["*"].

The check should not be using the Docker IP (even though the logs are). It is using the originating IP. You could fix that in Apache. But I think if you add an * it will work.

anderscollstrup commented 2 years ago

but "*" is already in the array?

anderscollstrup commented 2 years ago

I tried with

White-list for API = ["*"] White-listed hostnames = ["*"] Auto-login name = ["*"]

It does not work when I use another clean browser in incognito mode. My guess cookies are somehow involved.

dale-wahl commented 2 years ago

Did you restart afterwords the 4cat frontend after the changes? It should read an * as a wildcard and accept any hostname. Though Auto-login name ought to be some text like "Autologin"; I don't know if that is at all related.

If it still doesn't work, you can try updating your Apache to only forward it's IP and whitelist that host. You can make a general log in and provide it to your users. Or, perhaps, edit this function to always use the auto login.

anderscollstrup commented 2 years ago

I did restart the 4cat_frontend container. Also apache is always coming from 172.21.0.1

dale-wahl commented 2 years ago

It checks hostname not IP. It looks like (for me at least) 172.21.0.1 does not resolve to any hostname. You could check like so: docker exec 4cat_frontend python3 -c "import socket; print(socket.gethostbyaddr('172.21.0.1'));" to use the exact Docker environment. It's just a Reverse DNS lookup and relies on an IP address having a PTR record. I assumed you would forward the public IP of your Apache server which was presumably registered to your domain as a host.

You could add 172.21.0.1 to your own hosts list to basically trick it I guess.

sudo docker exec -it 4cat_frontend /bin/bash
apt-get install nano
nano /etc/hosts
# add this to the end of that file and save
172.21.0.1 whatever.com

Then add "whatever.com" to your White-listed hostnames list.

anderscollstrup commented 2 years ago

172.21.0.1 is a docker address. It is not defined in our network and as such it cannot be resolved.

root@MYSERVER:/home/anco# docker exec 4cat_frontend python3 -c "import socket; print(socket.gethostbyaddr('172.21.0.1'));" Traceback (most recent call last): File "", line 1, in socket.herror: [Errno 1] Unknown host root@MYSERVER:/home/anco#

If I modify the hosts file in my docker container, then I don't think the change will be persistent between reboots or restarts of container... will it?

dale-wahl commented 2 years ago

It will be persistent from stopping and starting (e.g. docker-compose stop). But not if you remove the container (e.g. docker-compose down). So you'll have to refer back to here or otherwise remember.

I cannot reproduce this "Docker address" issue when deploying on our servers. Our logs contain the originating IP. We will look at adding IPs to the whitelist in a future update. Just for you.

stijn-uva commented 2 years ago

As of https://github.com/digitalmethodsinitiative/4cat/commit/5a8d24667d70e71e87aa3a70c19682cfb98dd29a you can now also filter by IP in the whitelist. If that doesn't solve your problem, I'm afraid it is due to your Docker or Apache configuration and outside of the scope of what we can offer help with here.

anderscollstrup commented 2 years ago

sudo docker exec -it 4cat_frontend /bin/bash apt-get install nano nano /etc/hosts

add this to the end of that file and save

172.21.0.1 whatever.com

That works, when docker hub gets updated with the new version I will try stijn-uva fix. That should also solve the problem. The reason why I use apache proxy is to achieve the following:

Encryption with our certificate Access control from our ldap

If there is a better way than apache proxy to get, then I would like to know it :-)

anderscollstrup commented 2 years ago

however the container hack is not persistent doing stop/start of container :-/

anderscollstrup commented 2 years ago

At the moment I added:

extra_hosts:
  - "myserver.domain:172.21.0.1"

To docker-compose_prod.yml Then hostfile in container is persistent