ddvk / rmfakecloud

host your own cloud for the remarkable
GNU Affero General Public License v3.0
748 stars 61 forks source link

Improvement suggestion: supply `docker-compose.yml` to run a nameserver: frees a lot of memory for reMarkable tablet; easier setup; compatibility with Android and (probably) iPhone #127

Open Myridium opened 2 years ago

Myridium commented 2 years ago

Current problem with RAM usage on reMarkable

Currently, in order to redirect traffic from our PC or the reMarkable tablet to our rmfakecloud, we are advised to edit /etc/hosts to point to localhost, and then run a local proxy on each PC and reMarkable.

Editing the /etc/hosts file directly is something that irks me; it feels like poor practice. Then we must also run the reverse proxy locally. I don't like doing this as it's quite an opaque setup that is provided right now with this fork of secure, and it uses binaries archived within installer.sh. This is quite a poxy setup.

Perhaps more importantly, the current setup consumes a huge amount of memory on the reMarkable; far more than anything else on the device!

Mem: 956404K used, 71260K free, 7480K shrd, 16516K buff, 703704K cached
CPU:   0% usr   0% sys   0% nic  99% idle   0% io   0% irq   0% sirq
Load average: 0.03 0.03 0.06 2/114 1048
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1048  1016 root     R     2832   0%   0% top
  965     2 root     IW       0   0%   0% [kworker/u4:6-br]
  920     2 root     IW       0   0%   0% [kworker/1:0-eve]
  192     1 root     S     780m  78%   0% /home/root/rmfakecloud/rmfake-proxy -cert /home/root/rmfakecloud/proxy.bundle.crt -key /home/root/rmfakecloud/proxy.key https://my.remarkable.domain
  198     1 root     S     393m  39%   0% /usr/bin/xochitl --system
  194     1 root     S    68548   7%   0% /usr/bin/sync --service
  196     1 root     S    31248   3%   0% /usr/sbin/update_engine -foreground
  135     1 root     S    29096   3%   0% /lib/systemd/systemd-journald

(why VSZ% adds up to more than 100% I don't understand) That's really bad and it takes away from what other developers in the reMarkable ecosystem can do, because the RAM is mostly consumed by rmfake-proxy.

A better solution for 90% of users (using dhcpcd or having control over their nameserver selection)

There is a solution to this, which is more elegant and simplifies setup on the client devices (I believe as much setup as possible should happen on the server). I am currently running this setup myself on my PC and it is working. The same can be done for the reMarkable. (I have not done it yet because I'm running this through wireguard which is not available for the reMarkable)

Here's the idea: run traefik and bind9 alongside rmfakecloud to provide SSL encryption (https) and domain name resolution on the server side, taking that burden off of the clients.

Some users will already be familiar with the popular software pihole (I am not) in which case I think they can configure custom domain name resolution their own way.

For the other 10%

Unfortunately, this setup does require the nameserver running alongside rmfakecloud to take priority over any nameservers. Some users may not be willing to use their rmfakecloud server as the nameserver for ALL name resolutions. In that case, the user could run bind9 locally on their client devices and still benefit from the traefik proxy running on the server. This would free up all that memory.

Additional client setup + mobile device support

Setting up a custom domain name server will enable users to use any of their devices (provided they are not so locked down that they can't edit its settings-- e.g. maybe iPhone).

Users would just do 2 things when setting up a new device:

  1. Install the Certificate Authority.
  2. Set their preferred nameserver to the rmfakecloud IP.

That's it! As long as a user has enough control over their device to do these things, then they can point all relevant traffic to their own rmfakecloud server. The only issue would be if reMarkable started using hard-coded IPs in their software.

Other benefits

As long as no new Certificate Authorities have to be generated (new SSL certs are fine), then any changes to the list of reMarkable domains can be promulgated by simply updating rmfakecloud's nameserver. After the TTL of DNS records expires (maybe 5 minutes?) client devices will detect the changes.

Potential drawbacks

Splitting rmfakecloud repository

The rmfakecloud repository currently takes on a number of tasks. Ideally, it would only deal with the Dockerfile and maybe some helper scripts. Things like proxies and domain name resolution would ideally be split out into their own projects, or sub-repositories. This is something to consider.

nemunaire commented 2 years ago

What you're describing is a small variation of the 3rd setup variant.

As a sysadmin myself, I wasn't also convinced by installing a proxy on my tablet, so I run the 2nd and 3rd setup variants. As what you're suggesting doesn't work on some public/company networks that filters name resolution, 3rd is not perfect.

Also, it is not possible to fit everyone setups, for me the best we can do is to improve the documentation to make things clear on how to use it with nginx, treafik, HAproxy, Caddy, ... or whatever will be fashionable in the future.

/etc/hosts is something far more simple and responding to 99% cases than bind9 which comes with a lot of questions (what to do if you already run a name server, what to do on networks filtering name resolution, what to do if I want to use my own NS how to do that with knot, unbound, powerdns, ..., oops it didn't work because I only share TCP port in docker, not UDP, ...): that's what a sysadmin needs to address, not a rmfakecloud user, nor a rmfakecloud developer. That's why the documentation asks to use the automagic script/toltec, as it should be the way for most people. Then it describes alternate ways for sysadmin.

Myridium commented 2 years ago

What you're describing is a small variation of the 3rd setup variant.

As a sysadmin myself, I wasn't also convinced by installing a proxy on my tablet, so I run the 2nd and 3rd setup variants. As what you're suggesting doesn't work on some public/company networks that filters name resolution, 3rd is not perfect.

Also, it is not possible to fit everyone setups, for me the best we can do is to improve the documentation to make things clear on how to use it with nginx, treafik, HAproxy, Caddy, ... or whatever will be fashionable in the future.

/etc/hosts is something far more simple and responding to 99% cases than bind9 which comes with a lot of questions (what to do if you already run a name server, what to do on networks filtering name resolution, what to do if I want to use my own NS how to do that with knot, unbound, powerdns, ..., oops it didn't work because I only share TCP port in docker, not UDP, ...): that's what a sysadmin needs to address, not a rmfakecloud user, nor a rmfakecloud developer. That's why the documentation asks to use the automagic script/toltec, as it should be the way for most people. Then it describes alternate ways for sysadmin.

3rd setup variant still requires running a reverse-proxy on the reMarkable.

Editing /etc/hosts to point to the remote server directly (instead of 127.0.0.1) is not a satisfactory solution as it drops information about the specific domain name. Then there is no way to route traffic on the server to the rmfakecloud program as the usual method of redirecting traffic based on hostname cannot work.

nemunaire commented 2 years ago

3rd setup variant still requires running a reverse-proxy on the reMarkable.

No, no proxy/reverse-proxy on reMarkable. The only reverse-proxy is the one to redirect ingress traffic on the server hosting rmfakecloud. On the reMarkable, you need to add the CA, then either edit /etc/hosts or overwrite some DNS records, ...

Editing /etc/hosts to point to the remote server directly (instead of 127.0.0.1) is not a satisfactory solution as it drops information about the specific domain name. Then there is no way to route traffic on the server to the rmfakecloud program as the usual method of redirecting traffic based on hostname cannot work.

No, the /etc/hosts doesn't work the way you describe. When a program like curl asks for a domain in /etc/hosts, it calls getaddrinfo(3) which returns the IP in /etc/hosts, but it has no indication on how the domain has been resolved (through NS or through /etc/hosts or whatever nsswitch is configured for). So curl adds the Host: header with the domain name, the same way. So it is possible to direct ingress trafic as usual. Try it, it's what I use :)

Myridium commented 2 years ago

3rd setup variant still requires running a reverse-proxy on the reMarkable.

No, no proxy/reverse-proxy on reMarkable. The only reverse-proxy is the one to redirect ingress traffic on the server hosting rmfakecloud. On the reMarkable, you need to add the CA, then either edit /etc/hosts or overwrite some DNS records, ...

Editing /etc/hosts to point to the remote server directly (instead of 127.0.0.1) is not a satisfactory solution as it drops information about the specific domain name. Then there is no way to route traffic on the server to the rmfakecloud program as the usual method of redirecting traffic based on hostname cannot work.

No, the /etc/hosts doesn't work the way you describe. When a program like curl asks for a domain in /etc/hosts, it calls getaddrinfo(3) which returns the IP in /etc/hosts, but it has no indication on how the domain has been resolved (through NS or through /etc/hosts or whatever nsswitch is configured for). So curl adds the Host: header with the domain name, the same way. So it is possible to direct ingress trafic as usual. Try it, it's what I use :)

Oh, interesting, I don't know why I thought /etc/hosts worked otherwise. I guess it acts just like a domain name resolver. There's still the issue of having to redirect to a static IP though... better to have the reMarkable route name resolution requests to the server. Then if new domains need to be added, or the IP of the rmfakecloud changes, then no issues. But, I understand the problems with this as you have raised.

Do any scripts automatically modify /etc/hosts or is that file meant to be user-editable?

I'm actually confused now why there's a proxy running on the reMarkable at all. Why is it needed if we can simply edit /etc/hosts?

nemunaire commented 2 years ago

Do any scripts automatically modify /etc/hosts or is that file meant to be user-editable?

This file is like /etc/passwd, its contents is never replaced automatically, contrary to /etc/resolv.conf. It's safe to edit.

I'm actually confused now why there's a proxy running on the reMarkable at all. Why is it needed if we can simply edit /etc/hosts?

I don't understand neither.

Eeems commented 2 years ago

I'm actually confused now why there's a proxy running on the reMarkable at all. Why is it needed if we can simply edit /etc/hosts?

I don't understand neither.

To have a self-signed cert that is trusted by the device, and to simplify hosting requirements for rmfakecloud itself, since you'd have to setup your host on the other end to allow all the various hostnames required. Not to mention you might be hosting behind something like cloudflare, so you don't want to use a direct IP address.

Myridium commented 2 years ago

Do any scripts automatically modify /etc/hosts or is that file meant to be user-editable?

This file is like /etc/passwd, its contents is never replaced automatically, contrary to /etc/resolv.conf. It's safe to edit.

I'm actually confused now why there's a proxy running on the reMarkable at all. Why is it needed if we can simply edit /etc/hosts?

I don't understand neither.

Thanks, good to know.

To have a self-signed cert that is trusted by the device, and to simplify hosting requirements for rmfakecloud itself, since you'd have to setup your host on the other end to allow all the various hostnames required. Not to mention you might be hosting behind something like cloudflare, so you don't want to use a direct IP address.

Can you elaborate on why cloudflare means we can't use a direct IP address? I've never used cloudflare.

Eeems commented 2 years ago

To have a self-signed cert that is trusted by the device, and to simplify hosting requirements for rmfakecloud itself, since you'd have to setup your host on the other end to allow all the various hostnames required. Not to mention you might be hosting behind something like cloudflare, so you don't want to use a direct IP address.

Can you elaborate on why cloudflare means we can't use a direct IP address? I've never used cloudflare.

It's not that you can't, it's that you can't if you want to use their features like ddos protection etc. In order to use those you have to route through their network, which would not be a static IP address due to how their edge networks work.

kc9jud commented 2 years ago

Instead of running a full HTTPS proxy on the reMarkable, why not simply run a DNS proxy (on the reMarkable)? This should (1) reduce the memory footprint and (2) eliminate the need for a static IP in /etc/hosts. The DNS proxy would forward all requests as normal, except that requests for the cloud servers would instead resolve to the user's server.

I think full dnsmasq is probably overkill, but there are some lightweight alternatives (after a quick search): blocky dprox

ddvk commented 2 years ago

you have to generate a cert for all the hostnames the tablet is using, then install the ca on all devices you use (like browsers etc) which is just a hassle and hard to automate

ddvk commented 2 years ago

also about the memory usage @Myridium , VSZ is virtual memory (https://go.dev/doc/faq#Why_does_my_Go_process_use_so_much_virtual_memory), the real memory the proxy is using is around 10MB grep Vm /proc/$(pidof rmfake-proxy)/status