TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
996 stars 309 forks source link

Support for installing servers on different locations #1818

Open zamashal opened 4 years ago

zamashal commented 4 years ago

Summary

It would be very helpful to be able to install TTN Network Server, Application Server, and Join Server separately. Currently in the guides, I only found instructions to install the ttn-lw-stack all-in-one, but no option to install each server separately if you want them to work together from different environments. ...

Why do we need this ?

This is a great feature to have that would enable flexible methods for deployment. You may choose to install all 3 servers (NS, AS, and JS) on gateway, or you may choose to have another server with JS, and keep only NS and AS on gateway to enable centralized and remote management of multiple gateways, and so on. ...

What is already there? What do you see now?

Right now I only see a method to install ttn-lw-stack which includes all 3 servers (NS, AS, and JS). ...

What is missing? What do you want to see?

I would like to see instructions to separately install NS, AS, and JS instead of having them all in one installation/package. ...

How do you propose to document this?

Add it to the getting started guide. ...

Can you do this yourself and submit a Pull Request?

Not as of now, I am not sure if this is already partially implemented and probably someone knows how to do it more efficiently than me. ...

johanstokking commented 4 years ago

Thanks for the suggestion @zamashal

Indeed the Getting Started is currently for the single process approach, but as you might have seen, you can start the components individually. See;

$ ttn-lw-stack start --help
Start The Things Stack

Usage:
  ttn-lw-stack start [is|gs|ns|as|js|console|gcs|dtc|qrg|all]... [flags]

It's not very difficult to spawn services per component when these services are part of the same cluster and subnet.

I'm scoping this issue for now to instructions on how to;

zamashal commented 4 years ago

@johanstokking Thank you very much for your response and adding the issue to the backlog. In the meantime, I wonder if you can help me with this. I started the join server alone with the following command: ttn-lw-stack start js --cluster.network-server "ns_ip_address" --cluster.application-server "as_ip_address"

What I can't figure out is at which port the Join Server receive the Join_Req and is it going to automatically send the Join_Ans to the specified Network Server?

Thanks again!

johanstokking commented 4 years ago

@zamashal in fact JS is the server and NS and AS are clients. So configure the JS cluster address in NS and AS. That makes them work in the same cluster, although they are individual components. Note that this uses cluster authentication, which is designed for components trusting each other in the same cluster. If you are deploying GS, NS and AS on the edge, and JS in the cloud, this is probably not the case.

In that case, you have to use interop, via LoRaWAN Backend Interfaces, which is also supported. This allows NS to contact your JS via TLS client authentication.

That comes in two parts: configuring the NS to use your JS and configuring your JS with interop configuration (see --help). This is not fully documented yet either unfortunately.

zamashal commented 4 years ago

Thanks again @johanstokking ! I have been trying to get this setup working as you explained. There is one thing confusing me. At the link you provided, there is an example on how to set up Interoperability with Semtech Join Server. However, I am trying to use TTN Stack's Join Server itself, and not something external like Semtech's or others. Do I still need to put in the configuration for configure.yml and example/js.yml? If so, how would that look like then?

I have already configured my NS to work with an external JS (aka, TTN Stack's JS), but using port 8886 (Interop/tls) of the join server to send the Join_Req, the connection is getting refused although the JS seems to be listening on that port.

Thanks!

johanstokking commented 4 years ago

@zamashal Here are roughly the things that need to be done;

Join Server interop configuration

See flags:

      --interop.listen-tls string                                      Address for the interop server to listen on (default ":8886")
      --interop.sender-client-ca.blob.bucket string                    Bucket to use
      --interop.sender-client-ca.blob.path string                      Path to use
      --interop.sender-client-ca.directory string                      OS filesystem directory, which contains sender client CA configuration
      --interop.sender-client-ca.source string                         Source of the sender client CA configuration (static, directory, url, blob)
      --interop.sender-client-ca.url string                            URL, which contains sender client CA configuration

Interop has its own dedicated listener that uses TLS client authentication. You can use the same public IP address as for gRPC and use a dedicated interop port (default 8886).

You need a private CA that issues client certificates. These are used on the edge by NS. You can configure the trusted client CAs in the Join Server, and this is per NetID. You can always use NetIDs 000000 and 000001 in your private network, or join the LoRa Alliance and you get one yourself.

Set interop.sender-client-ca.source to directory and put in there a config.yml with for example:

# Experimentation
000000: ca-000000.pem

# The Things Network Foundation
#000013: ca-000013.pem

Your private CA goes in ca-000000.pem. You could add TTN's CA for the TTN NetID as in the example, just to show you how this works.

Network Server interop configuration

This is like documented, but indeed what you need is the local JS config. This would be as follows:

fqdn: 'thethings.example'
port: 8886
protocol: 'BI1.0'
tls:
  root-ca: 'path/to/clientca.pem'
  certificate: 'path/to/clientcert.pem'
  key: 'path/to/clientkey.pem'

Here, thethings.example is the FQDN of your Join Server and 8886 the port of that listen-tls that you configured in the JS interop.

Also, root-ca is (unlike what the example says) the root CA of the server certificate. This could be the same CA. You can also leave it out if you use a commercial (or Let's Encrypt) server cert which is already trusted by NS.

Enable debug logs on either side (log.level=debug) and you should see things working or traces to why things don't work. Good luck!

johanstokking commented 4 years ago

Also, if you make this work, feel free to file a pull request for documenting this. It would probably need a guide, but the reference page needs some love as well.

zamashal commented 4 years ago

@johanstokking, I will be working on this and hopefully as soon as I get it figured out I will be sure to make a pull request to update the guide. Can't thank you enough for all your help!

zamashal commented 4 years ago

Hey @johanstokking - I hope all is been going well with you. I would like to update you on my progress. Unfortunately, I have been tackling a lot of errors to get this working and I will share with you here the latest errors I am facing. After setting up interop and configuring my network server to send join requests to the join server at the default port 8886, I keep getting the following error on my network server log: error="join-request to join-server error: http post error: Post http://js-server_ip:8886: dial tcp js-server_ip:8886: connect: connection refused"

If I configure my network server to send the join requests to port 1884 of the gRPC server, I get instead the following error on the network server log: level=error msg="uplink: processing uplink frame error" ctx_id=f046310d-e528-4dd2-9dcb-6d5c8232a799 error="join-request to join-server error: http post error: Post http://js-server_ip:1884: net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x00\\x00\\f\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x05\\x00\\x00@\\x00\\x00\\x03\\x00\\x00\\xff\\xff\"" combined with the following error from the ttn stack log: stack_1 | WARN grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"POST / HTTP/1.1\\r\\nHost: 1\"" namespace=grpc

I hope you or anyone else can help me understand how to resolve these errors and know what may cause such errors.

Thanks again for your continued support!

johanstokking commented 4 years ago

The Join Server is only available over https.

It looks also like NS cannot resolve js-server_ip via DNS.

zamashal commented 4 years ago

Thank you @johanstokking! So yes it turns out I didn't map port 8886 to my host in the docker-compose.yml. Now the issue I have been facing is a TLS handshake error:

tls: failed to verify client's certificate: x509: certificate signed by unknown authority

For one thing, I used the flag --tls.insecure-skip-verify but it still insisted to verify the certificate and gave me the same error. I think the issue is that I need to trust the certificate authority in my docker container. I opened a shell into the stack and it gave me a Permission denied error whenever I try to copy the certificates into /usr/local/share/ca-certificates/ in order to trust them by the machine.

I think the --tls.insecure-skip-verify flag should have allowed it, but maybe your implementation is different. My issue now is that the docker container doesn't give me an option to trust my self-signed certificate. Is there something I am missing there?

johanstokking commented 4 years ago

Is the client certificate signed by one of the CAs for the SenderID as defined in the client CA configuration ?

That is what the Join Server uses to verify the client certificate; not the system trust or anything.

zamashal commented 4 years ago

I attempted to follow that, but it is not completely aligned with the instructions on the website. What I have is the following in my config.yml:

000000: ca-000000.pem
join-servers:
  - file: './example/js.yml'
    join-euis:
    - 'abcd000000000000/16'

and then I put this into my js.yml:

fqdn: 'thethings.example'
port: 8886
protocol: 'BI1.0'
tls:
  root-ca: 'path/to/clientca.pem'
  certificate: 'path/to/clientcert.pem'
  key: 'path/to/clientkey.pem'
johanstokking commented 4 years ago

The sender client CAs is not documented yet, we'll do that as part of closing or replacing this issue. See (here)[https://github.com/TheThingsNetwork/lorawan-stack/issues/1818#issuecomment-575534345]. It is a special file and it has its own setting to reference the file:

      --interop.sender-client-ca.blob.bucket string                    Bucket to use
      --interop.sender-client-ca.blob.path string                      Path to use
      --interop.sender-client-ca.directory string                      OS filesystem directory, which contains sender client CA configuration
      --interop.sender-client-ca.source string                         Source of the sender client CA configuration (static, directory, url, blob)
      --interop.sender-client-ca.url string                            URL, which contains sender client CA configuration

So source needs to be set to directory and you put the config in the aforementioned format in config.yml in that folder. That is a different directory than the interop config.

zamashal commented 4 years ago

Thank you @johanstokking! I didn't realize that should be in a different directory, I finally got past the certificates issue and now dealing this error from the ttn-stack debug log (I intentionally covered up the keys, but they were correct):

stack_1      |   INFO Join not accepted                        dev_eui=0000000000000000 error=error:pkg/redis:not_found (entity not found) join_eui=0000000000000000 method=POST namespace=joinserver/interop remote_addr=gateway_ip:49426 request_id=01E1D3PZ63CQ7VNCE5JE8SDC3J url=/
stack_1      |   INFO Request handled                          duration=2.948762ms error=error:pkg/interop:join_req (join-request failed) error_cause=error:pkg/redis:not_found (entity not found) method=POST namespace=interop remote_addr=gateway_ip:49426 request_id=01E1D3PZ63CQ7VNCE5JE8SDC3J status=400 url=/

Note, gateway_ip is also where the NS and AS reside.

This is also what I am seeing on the NS debug log:

time="2020-02-18T16:36:52-05:00" level=error msg="uplink: processing uplink frame error" ctx_id=ef20804f-13a8-4f7f-b90e-ce279c1e11ea error="join-request to join-server error: response error, code: JoinReqFailed, description: error:pkg/redis:not_found (entity not found)"

From what I can read, the error seems to be complaining about a misconfiguration of my redis component of the docker-compose. I revisited the configuration tutorial to make sure everything is matching. What I had on my configuration was this:

volumes:
      - ${DEV_DATA_DIR:-.env/data}/redis:/data

So I went ahead and changed it to this:

volumes:
    - './data/redis:/data'

Then, I started seeing the following error which doesn't even let me run the stack:

stack_1      | error:cmd/internal/shared:initialize_identity_server (could not initialize Identity Server)
stack_1      | --- error:pkg/identityserver:db_needs_migration (the database needs to be migrated)
stack_1      | --- pq: database "ttn_lorawan" does not exist

I wasn't sure if this change was necessary at all, under ./data/redis/ I only see one file ``appendonly.aof```, so it seems that I am missing something..

johanstokking commented 4 years ago

I wasn't sure if this change was necessary at all, under ./data/redis/ I only see one file ``appendonly.aof```, so it seems that I am missing something..

No that's fine for Redis in fact.

It looks like your device is not registered in the Join Server?

zamashal commented 4 years ago

Oh that is probably why. Well all I did was using the flag --js.join-eui-prefix but it seems that's not enough. I am stuck on another issue that I have been trying to ignore: issue 1942

Can I register the device by manually adding rows to the redis database? If so, what is the format? That might help me continue to ignore the other issue in the meantime..

zamashal commented 4 years ago

I was able to access the dashboard on the other issue and register the device on the dashboard. I am now seeing an error that is saying sender unknown which I believe is complaining about the gateway not being recognized. I tried to add the gateway from the console but it still says Disconnected. I tried to enter the address of the gateway_ip and the server_ip but both didn't seem to make any difference yet.

johanstokking commented 4 years ago

Sender unknown likely means that the NetID of the end device is not set to the NetID of your Network Server. Both should be set to 000000.

You can set the NetID of the end device via CLI with ttn-lw-cli end-device set <app-id> <dev-id> --net-id=000000

zamashal commented 4 years ago

My ttn-lw-cli is acting weird, I can only run the login command with the default options, and if I specify anything a configuration file or certificate authority I just get permission denied. I tried several ways around permissions by changing chmod and chown I continue to get permission denied. If I run the default configurations by only typing ttn-lw-cli login I get:

Post https://localhost:8885/oauth/token: x509: certificate signed by unknown authority

Although docker-compose up is running just fine without certificate issues or any other errors. Any idea what I might be missing which is likely causing the permissions denied? Thanks!

johanstokking commented 4 years ago

Can you post your server and CLI configuration and what you try to do exactly?

zamashal commented 4 years ago

I was just trying to log in first with the command sudo ttn-lw-cli login, here is my config:

# sudo ttn-lw-cli config
                         --allow-unknown-hosts="false"
                  --application-server-enabled="true"
             --application-server-grpc-address="localhost:8884"
                                          --ca=""
                                      --config="/etc/ttn-cli/.ttn-lw-cli.yml,/root/snap/ttn-lw-stack/149/.ttn-lw-cli.yml,/root/snap/ttn-lw-stack/149/.config/.ttn-lw-cli.yml"
                              --credentials-id=""
         --device-claiming-server-grpc-address="localhost:8884"
      --device-template-converter-grpc-address="localhost:8884"
                      --gateway-server-enabled="true"
                 --gateway-server-grpc-address="localhost:8884"
                --identity-server-grpc-address="localhost:8884"
                                --input-format="json"
                                    --insecure="false"
                         --join-server-enabled="true"
                    --join-server-grpc-address="localhost:8884"
                                   --log.level="info"
                      --network-server-enabled="true"
                 --network-server-grpc-address="localhost:8884"
                        --oauth-server-address="https://localhost:8885/oauth"
                               --output-format="json"
              --qr-code-generator-grpc-address="localhost:8884"

So running the default gives me the certificate signed by unknown authority error which I shared earlier. But due to the certificate issues, I attempted to add the following option: sudo ttn-lw-cli login --ca "path/to/ca.pem" but that gave me a permission denied error.

johanstokking commented 4 years ago

I attempted to add the following option: sudo ttn-lw-cli login --ca "path/to/ca.pem"

This is good. You can also put this in a config file or environment.

but that gave me a permission denied error.

On the CLI or server? Do you have logs?

zamashal commented 4 years ago

server error I think? this is all I can see:

root@myserver:/etc/ttn-cli# sudo ttn-lw-cli login --ca="/etc/ttn-cli/ca.pem" --log.level="debug"
open /etc/ttn-cli/ca.pem: permission denied

I also tried to give it chmod 777 permissions and still getting the same error..

zamashal commented 4 years ago

I was able to finally get around this issue by adding the configuration file to /root/snap/ttn-lw-stack/149/.ttn-lw-cli.yml!

zamashal commented 4 years ago

I am now getting a certificate signed by unknown authority error. How does the ttn-lw-cli tool trust a certificate? Here is the full log:

root@localhost:/etc/ttn-stack# sudo ttn-lw-cli login --callback=false --config="/root/snap/ttn-lw-stack/149/.ttn-lw-cli.yml" --log.level="debug" --insecure="true" --allow-unknown-hosts="true" --ca="/root/snap/ttn-lw-stack/149/ca.pem"
  WARN Access token expired at 5:17PM
 ERROR Please login with the login command
 DEBUG ccResolverWrapper: sending update to cc: {[{localhost:1884  <nil> 0 <nil>}] <nil> <nil>}
 DEBUG pickfirstBalancer: HandleSubConnStateChange: 0xc00087caa0, {CONNECTING <nil>}
 DEBUG pickfirstBalancer: HandleSubConnStateChange: 0xc00087caa0, {READY <nil>}
 DEBUG Finished unary call                      duration=2.376756ms grpc_method=AuthInfo grpc_service=ttn.lorawan.v3.EntityAccess namespace=grpc
  INFO Opening your browser on https://localhost/oauth/authorize?client_id=cli&redirect_uri=code&response_type=code
  WARN Could not open your browser, you'll have to go there yourself error=fork/exec /usr/bin/xdg-open: permission denied
  INFO After logging in and authorizing the CLI, we'll get an access token for future commands.
  INFO Please paste the authorization code and press enter
> MF2XI.JX2QFUHNVVWMEYTTRQ3S4DTGPI5VXBYJWVJQ2ZI.OG5C4HQXGMRQ4LVW7ES4IZRNH2L5OJOING2SWOW74LFLQAYDH64Q
 ERROR Could not exchange OAuth access token    error=Post https://localhost/oauth/token: x509: certificate signed by unknown authority
Post https://localhost/oauth/token: x509: certificate signed by unknown authority

I am using the same ca.pem that is trusted by the ttn-stack that I run with docker-compose.

zamashal commented 4 years ago

I got past the login/certificate issue again by using http URI and http ports in the ttn-lw-cli config. When I run sudo ttn-lw-cli end-device set "mysensor1app" "mysensor1dev" --net-id=000000 --log.level="debug", I see the following:

root@localhost:/etc/ttn-stack$ sudo ttn-lw-cli end-device set "mysensor1app" "mysensor1dev" --net-id=000000 --log.level="debug"
 DEBUG Using access token (valid until 6:42PM)
 DEBUG ccResolverWrapper: sending update to cc: {[{localhost:1884  <nil> 0 <nil>}] <nil> <nil>}
 DEBUG pickfirstBalancer: HandleSubConnStateChange: 0xc000414730, {CONNECTING <nil>}
  WARN grpc: addrConn.createTransport failed to connect to {localhost:1884  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...
 DEBUG pickfirstBalancer: HandleSubConnStateChange: 0xc000414730, {TRANSIENT_FAILURE connection error: desc = "transport: authentication handshake failed: context deadline exceeded"}
 DEBUG pickfirstBalancer: HandleSubConnStateChange: 0xc000414730, {CONNECTING <nil>}
  WARN grpc: addrConn.createTransport failed to connect to {localhost:1884  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...

Here is my ttn-lw-cli config:

                         --allow-unknown-hosts="true"
                  --application-server-enabled="true"
             --application-server-grpc-address="localhost:1884"
                                          --ca="/root/snap/ttn-lw-stack/149/ca.pem"
                                      --config="/etc/ttn-stack/.ttn-lw-cli.yml,/root/snap/ttn-lw-stack/149/.ttn-lw-cli.yml,/root/snap/ttn-lw-stack/149/.config/.ttn-lw-cli.yml"
                              --credentials-id=""
         --device-claiming-server-grpc-address="localhost:1884"
      --device-template-converter-grpc-address="localhost:1884"
                      --gateway-server-enabled="true"
                 --gateway-server-grpc-address="localhost:1884"
                --identity-server-grpc-address="localhost:1884"
                                --input-format="json"
                                    --insecure="true"
                         --join-server-enabled="true"
                    --join-server-grpc-address="localhost:1884"
                                   --log.level="info"
                      --network-server-enabled="true"
                 --network-server-grpc-address="localhost:1884"
                        --oauth-server-address="http://localhost/oauth"
                               --output-format="json"
              --qr-code-generator-grpc-address="localhost:1884"

I think that this could be related to my http setup, although I had INFO Got OAuth access token message after login which seems to indicate successful authentication.

I also started to see the following error from my docker-compose logs:

stack_1      |  DEBUG Rejected authentication                  client_id=mqtt_5bc528ca.ae4ea8 error=error:pkg/ttnpb:identifiers (invalid identifiers) error_cause=error:pkg/errors:validation (invalid `application_id`: value does not match regex pattern "^[a-z0-9](?:[-]?[a-z0-9]){2,}$") field=application_id name=ApplicationIdentifiersValidationError namespace=applicationserver/io/mqtt reason=value does not match regex pattern "^[a-z0-9](?:[-]?[a-z0-9]){2,}$" username=
stack_1      |   WARN Failed to setup connection               error=error:pkg/ttnpb:identifiers (invalid identifiers) error_cause=error:pkg/errors:validation (invalid `application_id`: value does not match regex pattern "^[a-z0-9](?:[-]?[a-z0-9]){2,}$") field=application_id name=ApplicationIdentifiersValidationError namespace=applicationserver/io/mqtt reason=value does not match regex pattern "^[a-z0-9](?:[-]?[a-z0-9]){2,}$" remote_addr=172.18.0.1:57472

I couldn't figure out what it's referring to but I thought it might be complaining about the same device and application that I have added and still don't have the sensor joined.

johanstokking commented 4 years ago

I am now getting a certificate signed by unknown authority error. How does the ttn-lw-cli tool trust a certificate?

It uses the CA file you pass with ca. That file should either point to the server certificate (if it's self-signed) or the CA that signed the server certificate.

Here is my ttn-lw-cli config:

This config looks good if you're don't want to use TLS. But, is the server listening on these addresses, in its non-TLS config?

I also started to see the following error from my docker-compose logs:

This is an MQTT client connecting with a username that is not a valid application ID.

zamashal commented 4 years ago

Thanks for the hints! Pointing to cert.pem instead of ca.pem solved the certificate signed by unknown authority issue. However, I am still getting the other connection error. I am definitely listening on port 1884:

user@localhost:/etc/ttn-stack$ sudo netstat -tulpn | grep LISTEN
tcp6       0      0 :::1884                 :::*                    LISTEN      18793/docker-proxy

I can also see data packets coming through when I telnet to the port 1884 and run the ttn-lw-cli tool. So there's definitely an exchange of packets happening, but the debug log still gives me the following error: "transport: authentication handshake failed: context deadline exceeded". Reconnecting...

zamashal commented 4 years ago

I finally solved this issue by adding the --insecure flag to the end-device set command!! It seems that I am having issues with TLS, but I am not worried about that now anyways Thanks again!

zamashal commented 4 years ago

I am thrilled to inform you that after setting --root-keys.app-key.key in addition to --net-id, the join process for end-device completed successfully and I started getting the data from the end device on the independent Application Server! Thank you again for your great help through all the issues I have faced!

johanstokking commented 4 years ago

That's great! It would be awesome if you can document your scenario here, so we can incorporate it.

Thank you also for the motivation and being the first pancake.