TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
983 stars 309 forks source link

Add a troubleshooting section in our Getting Started guide #2353

Closed neoaggelos closed 4 years ago

neoaggelos commented 4 years ago

Summary

Like #2352. Add a troubleshooting section in the Getting Started for common problems that may arise when following the Getting Started guide.

Why do we need this ?

Make docs friendlier to new users.

What is already there? What do you see now?

No troubleshooting section.

What is missing? What do you want to see?

A Troubleshooting section at the end of the getting started guide, for users to be able to look up common problems, along with the reason and simple steps to fix them.

How do you propose to document this?

Our docs should generally be straightforward and easy to follow. However, having a troubleshooting section, with specific error messages and instructions to fix them could prove very helpful for new users.

Can you do this yourself and submit a Pull Request?

yes

fox27374 commented 4 years ago

Hi, definitely a thumbs up to this. I ran into a couple of problems and open questions while following the guide. At the moment I am stuck with this error. Maybe you can also point to this one in the documentation? image

johanstokking commented 4 years ago

@fox27374 can you open the browser developer tools and paste the window.PAGE_DATA value? You can enter that in the browser console while seeing this error.

Also, did you follow all steps in the Getting Started, i.e. for creating the Console OAuth client?

fox27374 commented 4 years ago

Hi, here is the window.PAGE_DATA as well as the command I use for creating the oauth client. One important point to mention is, that I use my own certificates (signed by the lab CA).

DATA window.PAGE_DATA = { "error": { "code": 7, "message": "error:pkg/web/oauthclient:exchange (token exchange refused)", "details": [{ "@type": "type.googleapis.com/ttn.lorawan.v3.ErrorDetails", "namespace": "pkg/web/oauthclient", "name": "exchange", "message_format": "token exchange refused", "code": 7 }] } };

COMMAND docker-compose run --rm stack is-db create-oauth-client --id console --name "Console" --owner admin --secret "SM2CE7335KDAIILCA76KETRHDQTTDAQTDJHBSL6RCOX3WFZFDZ4Q" --redirect-uri "https://lora01.ntslab.loc/console/oauth/callback" --redirect-uri "/console/oauth/callback"

Thanks a lot! Cheers, Daniel

johanstokking commented 4 years ago

@fox27374 thanks for the additional information.

What is the configured OAuth URL, i.e. the /token URL that you configured? You can redact sensitive content.

Can you confirm that lora01.ntslab.loc resolves in the Docker container, assuming that you run The Things Stack via Docker?

fox27374 commented 4 years ago

Hi,

Thank you for the reply and for helping me here. The content is not yet sensible, its a lab setup for now as a test for a future production environment. I want to get rid of the Actility server :)

Yes, i run the TTN stack via Docker on a Linux server. lora01.ntslab.loc is configured in the hosts file, so name resolution should work.

The /token URL is: token-url: 'https://lora01.ntslab.loc/oauth/token'

If you need more information, you can directly have a look at the docker-compose.yml and the ttn-lw-stack.yml files. I also use a start script to do the initialisation (start.sh).

Thank you in advance, Daniel

neoaggelos commented 4 years ago

Hi @fox27374

Yes, i run the TTN stack via Docker on a Linux server. lora01.ntslab.loc is configured in the hosts file, so name resolution should work.

Do you mean the /etc/hosts file of your machine? This does not affect the Docker container where the stack is running, which could be the source of the issue you are seeing.

You could check that with the following command:

$ docker-compose stack exec nc -z lora01.ntslab.loc

You should see something along the lines of nc: bad address 'lora01.ntslab.loc'.

Can you try adding an extra_hosts section in your docker-compose.yaml, like so:

# docker-compose.yaml
services:
  # ...
  stack:
    # ...
    extra_hosts:
      - "lora01.ntslab.loc:YOUR_IP_ADDRESS"
    # ...

And restart with docker-compose up -d

The hostname resolution should then work. (But, if YOUR_IP_ADDRESS is something like 127.0.0.1, then you might still get some errors)

fox27374 commented 4 years ago

Hi @neoaggelos thank you for the info. I removed the hosts entry and set the IP/hostname directly on the DNS server. Additionally I added the "extra_hosts" entry in the docker-compose.yml. I am afraid, the error still exists.

I started ash shell in the container and and checked the dns resolution:

$ nslookup lora01.ntslab.loc
Name:      lora01.ntslab.loc
Address 1: 172.24.89.120 lora01.ntslab.loc

So this seems good. Following the error message token exchange refused, is there any further debugging we can enable for the oauth token exchange? Sorry to keep you busy with this .... Thanks

fox27374 commented 4 years ago

By the way, seems like someone else also has the same problem

neoaggelos commented 4 years ago

Hi @neoaggelos thank you for the info. I removed the hosts entry and set the IP/hostname directly on the DNS server. Additionally I added the "extra_hosts" entry in the docker-compose.yml.

Hmm, with proper DNS configuration, you should not have to set extra_hosts.

I am afraid, the error still exists.

I started ash shell in the container and and checked the dns resolution:

$ nslookup lora01.ntslab.loc
Name:      lora01.ntslab.loc
Address 1: 172.24.89.120 lora01.ntslab.loc

The 172.24.89.120 is the one from the network created by Docker, which could also be a possible reason of failure.

So this seems good. Following the error message token exchange refused, is there any further debugging we can enable for the oauth token exchange? Sorry to keep you busy with this .... Thanks

Try clearing your cookies, and trying from a clean browser session as well. Also, make sure the certificates are properly read from the stack cat /var/run/secrets/cert.pem and cat /var/run/secrets/key.pem from a shell within the container should be enough to check that one.

Off-topic; Have you tried setting up the stack on localhost? Did you succeed?

fox27374 commented 4 years ago

Hi,

sorry, i did not mention that the 172.24.89.120 is the IP address of the server itself in the lab. The docker addresses are 172.9.0.X

I do all the tests with a browser in private mode, so there are no cookies involved. The key and cert is readable with the "thethings" user:

/ $ whoami
thethings

/ $ cat /var/run/secrets/key.pem 
-----BEGIN PRIVATE KEY-----
MIIEvwIBADANBgkqhkiG9w0BAQEFAASCBKkwggSlAgEAAoIBAQC7IjZoBd2Mu4Ev
AYDrEh6mBWYw5cRDA02F10OQpbQbm6RigFbODM2owGRyCkkZfAUL2VV9xl5TzdMl
I6IecaA7/F7TpciuiJHmnfRVAbDlPI6EJYybdrU7tmfdeWc/ThuVVNolJFUeap+T
OIzv9MkGbBAF19ju4PJel6z3ef+NUhc5LKfjVQZeieQULX2b9+Hpd4ySdR2Nfzdt
......

I will try to change the setup to localhost and keep you posted.

johanstokking commented 4 years ago

sorry, i did not mention that the 172.24.89.120 is the IP address of the server itself in the lab. The docker addresses are 172.9.0.X

But can you curl https://lora01.ntslab.loc from inside the container? If not, what is the error reported?

fox27374 commented 4 years ago

Hi,

seems like we got it. The curl hint was a good one. This showed, that the ca.pem was not in the trusted certificate store:

/ # curl https://lora01.ntslab.loc
curl: (60) SSL certificate problem: self signed certificate in certificate chain

So I copied the ca.pem certificate to /usr/local/share/ca-certificates/

/ $ ls -la /usr/local/share/ca-certificates/ca.pem 
-rw-r--r--    1 thething thething      1310 Apr 14 11:36 /usr/local/share/ca-certificates/ca.pem

by adding it to the volumes section of the docker-compose.yml file:

volumes:
      - "./data/blob:/srv/ttn-lorawan/public/blob"
      - "./config/stack:/config:ro"
      - "./config/stack/cert/ca.pem:/usr/local/share/ca-certificates/ca.pem"

Now I am able to login to the console and all certificates are trusted. Awesome!

Is this the best / intended way of adding a trusted root certificate to the TTN container?

fox27374 commented 4 years ago

Sorry for beeing euphoric too early. It seems like the auth token was still in the DB, thats why everything worked. After the container starts, I needed to run this command in order to add the ca.pem certificate to the trusted store:

docker exec -it --user root ttn-server_stack_1 /usr/sbin/update-ca-certificates

Then the oauth client is able to get a token and store it in the DB. I can work for now, but this should not be the final solution i guess. Any ideas? Thanks a lot!

johanstokking commented 4 years ago

@fox27374 great that you found the cause. That's always a good start to come up with a clean solution.

The stack respects TTN_LW_TLS_ROOT_CA (or tls.root-ca), a file name, with your CA. See https://thethingsstack.io/v3.7.0/reference/configuration/the-things-stack/

fox27374 commented 4 years ago

@johanstokking : I added the folowing to the docker-compose.yml

stack
......
    secrets:
      - cert.pem
      - key.pem
      - ca.pem

secrets:
  cert.pem:
    file: config/stack/cert/cert.pem
  key.pem:
    file: config/stack/cert/key.pem
  ca.pem:
    file: config/stack/cert/ca.pem

This way, the certificate files are available in the container in /run/secrets and /var/run/secrets. I checked this direclty in the container.

I added TTN_LW_TLS_ROOT_CA: "/var/run/secrets/ca.pem" to the docker-compose.yml file. The error is still there. I also tried to add this to the ttn-lw-stack.yml:

tls:
  source: "file"
  root-ca: "/var/run/secrets/ca.pem"
  certificate: "/var/run/secrets/cert.pem"
  key: "/var/run/secrets/key.pem"

Same thing here. I still get the error. Could it be, that some applications, especially the oauth client use the OS internal trusted root certificates? Because as soon as I add the ca.pem to the trusted root certificates, everything works. Thanks, Daniel

johanstokking commented 4 years ago

cc @adriansmares

fox27374 commented 4 years ago

Hi, any news here? I tried debugging the access to the trusted root certificates with strace but did not succeed.

johanstokking commented 4 years ago

@fox27374 can you verify that this works?

$ curl -cacert /var/run/secrets/ca.pem https://lora01.ntslab.loc

@adriansmares looks like we need two things;

  1. Report the underlying error cause, potentially as reason attribute, as it's a net error or something else stdlib
  2. Verify that we are respecting tls.root-ca in the OAuth client
Lucianovici commented 4 years ago

Hi guys,

I am getting the same 403 error, running TTN stack v3 with docker within a Vagrant box (with Virtual Box). - Just a sandbox for me to create the Saltstack recipe.

I tried many approaches, considering I took care of the DNS.

For me it is not a problem of root-ca, I don't know what it is. Should we open another issue for this?

One question though: From your knowledge, is it possible to config it without TLS, just for dev purposes within a Vagrant box? If so would you please give me some pointers?

I can confirm that on my VPS it works fine with letsencrypt, which is of course what we'll have in production.

Thanks.

johanstokking commented 4 years ago

Adding c/shared cause it might not be a config thing

fox27374 commented 4 years ago

Hi, sorry for the late reply. I can verify that curl only works with the --cacert parameter as the ca.pem certificate is not installed in the tusted root certificates:

/ $ whoami
thethings
/ $ curl https://lora01.ntslab.loc
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
/ $ curl --cacert /var/run/secrets/ca.pem https://lora01.ntslab.loc
/ $ 
johanstokking commented 4 years ago

Please check if the OAuth client respects the TLS configuration

wasn-eu commented 4 years ago

if you use nginx in front of the stack nginx must handle all ssl/tls.

this are the configs for nginx:

nginx.conf

stream {
    include stream_conf.d/*.conf;
}

stream_conf.d/mqtt.conf

log_format mqtt '$remote_addr [$time_local] $protocol $status $bytes_received '
                '$bytes_sent $upstream_addr';

upstream ttn1 {
    server stack-ip:1881;
    zone tcp_mem 64k;
}
upstream ttn2 {
    server stack-ip:1882;
    zone tcp_mem 64k;
}
upstream ttn3 {
    server stack-ip:1883;
    zone tcp_mem 64k;
}

server {
    listen 8881 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn1;
    proxy_connect_timeout 1s;
}

server {
    listen 8882 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn2;
    proxy_connect_timeout 1s;

server {
    listen 8883 ssl; # MQTT secure port
    preread_buffer_size 1k;

    ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
    ssl_ciphers         HIGH:!aNULL:!MD5;
    ssl_session_cache   shared:SSL:128m; # 128MB ~= 500k sessions
    ssl_session_tickets on;
    ssl_session_timeout 8h;

    proxy_pass ttn3;
    proxy_connect_timeout 1s;
}

server {
    listen 1881; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn1;
    proxy_connect_timeout 1s;
}

server {
    listen 1882; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn2;
    proxy_connect_timeout 1s;
}

server {
    listen 1883; # MQTT secure port
    preread_buffer_size 1k;

    proxy_pass ttn3;
    proxy_connect_timeout 1s;
}

you need this in your site config for all ports (PORT=1884, 1885, 1887):

server {
        server_name FQDN;

        location / {
                proxy_pass      http://stack-ip:PORT;
                proxy_set_header Host $http_host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $server_name;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "Upgrade";
                proxy_buffering off;
        }

       listen [::]:PORT ipv6only=on; # managed by Certbot
       listen PORT; # managed by Certbot
}

and this for ports (PORT/PORTSSL=1885/443, 1884/8884, 1887/8887):

server {

        server_name FQDN;

        location / {
                proxy_pass      http://stack-ip:PORT;
                proxy_set_header Host $http_host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Host $server_name;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "Upgrade";
                proxy_buffering off;
        }

        listen [::]:PORTSSL ssl ipv6only=on; # managed by Certbot
        listen PORTSSL ssl; # managed by Certbot
        ssl_certificate /etc/letsencrypt/live/FQDN/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/FQDN/privkey.pem; # managed by Certbot
        include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

as you can see i am using lets encrypt.

neoaggelos commented 4 years ago

Thanks a lot @wasn-eu!

This is also useful for #1760.

ramampiandra commented 4 years ago

Hi all,

I have a similar issue when installing TTN 3.7 on ubuntu.

I followed the fox27374's guide (https://github.com/fox27374/lora-stack) but still have the issue. My installation is on VM and Ubuntu. I use self signed certificate for local development.

I am still stuck with this error. "Token Refused Exchange" Thank you in advance,

fox27374 commented 4 years ago

Hi @ramampiandra,

as I wrote in the Slack chat, for the whole thing to work, you need the following:

Please make sure that the certificates are correct:

cert.pem

openssl x509 -in cert.pem -text -noout | grep -A 1 Identifier
            X509v3 Subject Key Identifier:
                26:78:63:90:E7:1C:09:B7:DA:B3:7D:81:F0:DE:47:6B:AE:16:58:79
            X509v3 Authority Key Identifier:
                keyid:86:32:F5:56:44:21:EC:E3:2A:D9:5F:6E:87:82:7A:67:C2:F1:77:E8

ca.pem

openssl x509 -in ca.pem -text -noout | grep -A 1 Identifier
            X509v3 Subject Key Identifier:
                86:32:F5:56:44:21:EC:E3:2A:D9:5F:6E:87:82:7A:67:C2:F1:77:E8

Make sure that the Authority Key Identifyer in the cert.pem is the same as the Subject Key Identifyer in the ca.pem.

After the stack is started and all docker containers are up, run the following command (adapt the "ttn-server_stack_1" to the name of your TTN container): docker exec -it --user root ttn-server_stack_1 /usr/sbin/update-ca-certificates This will install the ca.pem certificate within the container and add it to the trusted certificates.

After that, directly login to your container and test if the certificate works:

docker-compose exec stack "/bin/ash"
curl https://YOURSERVER.YOUR.DOMAIN

You should NOT see any result or error - this means your certificate is trusted.

I hope this helps, Cheers

kschiffer commented 4 years ago

So after looking into this in detail, I was able to reproduce and can confirm that there is indeed a problem with the TLS config (and specifically root certificates) not being respected by our OAuth flow, causing the token exchange to fail.

I'm currently working on a PR to fix this which should land later today.

fox27374 commented 4 years ago

@kschiffer awesome, thank you for having a look at this. Just keep me posted so that I can help you with testing.

dgraposo commented 4 years ago

Hi! There is another workaround, to fix this temporarily?

johanstokking commented 4 years ago

@dgraposo this should be fixed in 3.8.1

kschiffer commented 4 years ago

I will close this issue for now, since the focus moved to the "token exchange refused" issue, which has been addressed via #2511 and which can be followed further via #2521. I suspect this was the biggest reason to add a troubleshooting section.

This issue is not very useful anymore to discuss its initial purpose. I suggest reopening with proper scope if we deem a troubleshooting section to be necessary still.