basho / node_package

RPM/Debian/FreeBSD/SmartOS/Solaris/OSX packaging templates for Erlang Nodes
Apache License 2.0
89 stars 63 forks source link

Fix RIAK-2702 by restricting possible atom usage to 1000 #210

Closed bsparrow435 closed 7 years ago

bsparrow435 commented 7 years ago

We were using the OS PID as a pseudo random number generator but the range is too large since each node name used would generate an entry in the atom table. Restrict the possible atoms used by riak and riak-admin commands to 1000. Also, change node name used by riak top to match riak-admin top convention.

JeetKunDoug commented 7 years ago

+1

binarytemple-external commented 7 years ago

On a RHEL 7 system - the maximum pid size is set to 32768 with a configurable maximum size of ~ 4 million.

/proc/sys/kernel/pid_max  
  This file (new in Linux 2.5) specifies the value at which PIDs wrap around
  (i.e., the value in this file is one greater than the maximum PID). The
  default value for this file, 32768, results in the same range of PIDs as
  on earlier kernels. On 32-bit platfroms, 32768 is the maximum value for
  pid_max. On 64-bit systems, pid_max can be set to any value up to 2^22
  (PID_MAX_LIMIT, approximately 4 million

As Riak ships with the default erl atom count settings (1,048,576) (the +t flag is not set) this should not present an issue for existing Riak installs.

What concerns me, and it's not a huge concern - but worth mentioning - is the possibility that we've gone from a situation, where on a typical 2017 system - we use at most 3% of the default atom allocation - to one where we now stand a 1/1000 chance of client node identifier conflict if two users are on a system at the same time executing riak/riak-admin commands.

What would be the behavior in that case? An error that the client node name was in conflict, results sent to the wrong client, or just an Erlang stack trace?

bsparrow435 commented 7 years ago

A collision will result in one of the clients failing to connect to the cluster and getting an error message that the name was already in use. The user will then need to re-run the command.

binarytemple-external commented 7 years ago

Thanks @bsparrow435