Closed yuce closed 7 years ago
Broken the Survey part of the proposal here, since that doesn't really fit into the proposal:
In this section we present a selection of configuration options supported by other databases.
Some of the command line flags CockroachDB supports are:
--host
: default localhost--port
: default 26257--store
: changes data store location, default: cockroach-data--background
: runs the server as a daemon--join
: joins node to a cluster, argument is in the form host:portEnvironment variables are supported. A few examples:
We found no information about how to configure CockroachDB with a configuration file.
InfluxDB supports the following non-exhaustive list of flags:
-database
-host
: default: localhost-port
: default: 8086InfluxDB supports TOML formatted configuration files. Example:
[data]
dir = "/var/lib/influxdb/data"
query-log-enabled = true
Environment variables are supported in the form: INFLUXDB_config-section-name_option-name
.
Following are some of the command line flags supported by OrientDB:
-h
, --host
: default localhost-P
, --ports
: single port or port range, defaults to: 2424-2430-u
, --user
: default: root-p
, --password
: mandatory user password (default: root)Configuration file is in XML format. Sample configuration:
<properties>
<entry name="cache.size" value="10000" />
<entry name="storage.keepOpen" value="true" />
</properties>
Environment variables are supported. Below are some examples:
Redis supports the following command line flags and more:
--port
: default 6379--bind
: the interface to listen (default: 0.0.0.0)--daemonize
--pidfile
Redis has a simple configuration file format, which lists the keys and values separated by whiteline. Same keys and values maybe separated on the commandline. Sample configuration:
daemonize no
pidfile /var/run/redis.pid
port 6379
bind 127.0.0.1
Redis does not support configuration using environment variables.
Below is some of the flags supported by RethinkDB:
-d
, --directory
: The directory to store the data--daemon
: Run the server as a daemon--log-file
: Specify the log file--config-file
: Specify the configuration file--bind
: Specify the address to listen to, default localhostRethinkDB uses a simple configuration file with configuration specified as KEY=VALUE
lines. Here’s a sample:
pid-file=/var/run/rethinkdb/rethinkdb.pid
bind=127.0.0.1
cluster-port=29015
We found no information about environment variable support of RethinkDB.
What's the plan for actually implementing the cascading configuration stuff? I know there are libraries which do a lot of this, but they have their own pros and cons. Any thoughts on using a library vs rolling our own?
@jaffee re: cascading, do you mean configuration priority ?
Overall I really like the direction of this. There are a few things I might suggest changing to add clarity (for example, instead of -bind
, use -bindaddr
or bind-address
or something like that).
Also, as for One of the debates in the computing world is the number of dashes before a flag...
I fall on the "single dash" side of that debate.
@yuce yes, by cascading I mean configuration priority
Single-dash/double-dash debate is mostly about tastes, so I guess we can't go wrong by picking any of them as long as we are consistent. Should we vote on that or any other way to resolve that?
The most important thing for me is consistently using a single flag anywhere an address is required (instead of specifying host and port).I've proposed bind
since host
is a bit overloaded, and when I hear that I immediately look for a port
option (but host
is very prevalent) . IMO bindaddr
is a bit long, how about addr
for HTTP and PROTOCOL-addr
for other protocols?
@jaffee It never occurred to me there would already be libraries doing that. If there's something we can use, why not? Is there any you can recommend?
@yuce I only have experience with viper - unfortunately it pulls in a lot of dependencies we don't need (they may have fixed this so you can opt out of them.) Might be a good starting point though.
Would using http
instead of bind
or host
make sense? We could use PROTOCOL
for other protocols, like protobuf
:
$ pilosa -http localhost:5000 -protobuf localhost:6000
@yuce can you investigate viper and let us know the pros/cons.
Based on discussion on https://github.com/pilosa/pilosa/issues/273 around moving pilosactl commands under pilosa, we might also consider using viper's counterpart "cobra" which is a library for creating CLIs (which uses viper for config).
Viper looks good. It has about 10 dependencies, but I guess we can give it a try. Cobra for adding subcommands looks good too. Both libraries depend on pflag which is from the same developer. That library implements a Go flag
compatible library supporting GNU style flags. I think we can make use of that too.
Updated the proposal with the following:
Address: Complete address of a service. Aliases: bind, BIND
SCHEME://HOST:PORT
form: Specify scheme, host and portHOST:PORT
form: Specify host and port, use the default schemeHOST
form: Specify the host and use the default port and scheme:PORT
form: Specify the port and use the default host and schemeSCHEME://HOST
form: Specify scheme, host and use the default portSCHEME://:PORT
form: Specify scheme and port and use the default hostI reviewed all the discussion and overall I'm happy with Yuce's proposal as it exists currently. I'm not strongly opinionated either way on "-" vs. "--", but it seems that cobra chose the "--" route, so I'm ok with that. I think we should go ahead and plan on implementing this.
My only comment on the proposal is this. In table 2, the following configuration options are required: data directory, http address, and log path. Could we not choose sensible defaults ($HOME/.pilosa, 127.0.0.1:15000, and /dev/stdout) and not require them to be specified? That's how things currently work. It seems like we don't really have to have any required options.
I agree @codysoyland
Running pilosa
(or pilosa server
assuming we go the subcommand route with cobra) should alway start pilosa. Don't require a new user to fumble with several flags just to get running for the first time.
Ah, I had forgotten the decision from https://github.com/pilosa/pilosa/issues/273 was to use subcommands for sure.
Thanks for your comments. In table 2 http address and log path are required, but they have defaults (http://localhost:15000
and stdout
respectively) so the user should only specify the data directory. The reason is, it maybe hard to determine a location which is standard/expected on all platforms (e.g., it maybe a bit strange to have the .pilosa
directory on Windows (since there's no similar convention for naming hidden files there)). Also, I am not sure why the default directory should be hidden on UNIX platforms.
Do you guys have any suggestion for the default data dir? Should we just keep $HOME/.pilosa
?
Keeping pilosa a single executable and making use of subcommands makes a lot of sense. @jaffee Was there a decision on whether we would use pilosa server
or pilosa run
? I will update the proposal accordingly.
I guess all the cards are on the table about using single or double dashes so I'll update the proposal according to @travisturner 's decision.
double dashes
is fine.
I don't think the user should have to specify the data directory. keeping the default to $HOME/.pilosa
makes sense to me.
Updated the proposal with the default data directory set to $HOME/.pilosa
.
Removed needs-decision, as I think this is pretty ready-to-go. @yuce I'm happy to implement this, but if you want to do it, I think you have the right of first refusal given all your work on this proposal.
@jaffee I That's perfectly OK; you've already worked with viper/cobra, so you have more experience with it anyway. One thing that would be great to have is having some kind of testing for configuration, command line args, etc. (I have a few ideas about this, will try to write them down/implement a prototype later)
@alanbernstein suggested changing the default pilosa port from 15000
to 10101
. I'm going to do that unless there are objections.
I I think changing the port to 10101
is both fascinating and not very useful at the same time.
@jaffee can you expand on the thinking behind that port change suggestion
It's kind of funny? Since it's "binary".
That's really about it. @codysoyland mentioned that he wasn't a fan of 15000 and then alan said 10101 and we all thought that sounded perfect.
Changed default cpu profile duration to 30 seconds (from 30 nanoseconds). 30 ns isn't really a useful amount of time to collect a profile.
Changed --data (cmd line flag) to --data-dir so that it matches configuration file.
The way I'm implementing this, the env variables, config file and cmd line are all going to have to match. (except that the env variables will be all caps, prefixed with PILOSA_
, and any dashes will be underscores.
Due to the way viper works, any command line flags which are represented in something nested in the config file, will have to be similarly nested with dots on the command line and in the environment, so
[cluster]
hosts = ["localhost:15000","localhost:15001"]
will look like --cluster.hosts="localhost:15000,localhost:15001"
on the command line and PILOSA_CLUSTER.HOSTS
as an environment variable.
I will update the original ticket and catalog any edits in comments in case anyone takes issue with the changes.
changed: replicas to cluster.replicas nodes to cluster.hosts --poll to --poll-interval antientropy to anti-entropy.interval profile to profile.cpu profile-duration to profile.cpu-time
(I will edit this comment with further changes)
Updated authors and changed the default port to 101010
@jaffee I thought https://github.com/pilosa/pilosa/pull/394 implemented some of this proposal, and remaining parts would be implemented in subsequent PRs, e.g., log, plugin and gossip related ones. Does it make sense to keep the ticket open until they are imlpemented?
That's a good point @yuce, we should capture that.
I feel like that functionality is separate from the notion of "how do we do config" which is what this covers, and that we should break it out into separate tickets. Especially since some of it (like plugins) may not get done for a long time, and most of the work for those things will be outside of the config code.
Of course the relevant portions of this ticket will live on in the documentation, but I'd prefer we not leave it open getting stale for potentially many months.
I'll create a ticket for the log path stuff - I think the other two will be up to the implementers of that functionality to figure out what the best set of flags are that are needed to support it.
Pilosa Configuration Proposal
Authors
Change log
Abstract
This document contains the proposals for command line, configuration file and environment variable naming and their priorities.
Overview
Configuration is one of the most important parts of software. If a consistent naming convention is not used, it may become hard to configure software. If the software doesn't support the same configuration as in the documentation, the users of the software may get frustrated. Because of those reasons, it is beneficial to have a single reference of configuration options, which Pilosa developers use during development and which can be used to write/update the documentation.
Most common configuration comes from the command line, a configuration file and environment variables. What set of options should be supported by these configuration sources? If the same option is used in two different sources, which should have the upper hand? These questions should be clearly answered to decrease the number of surprises during operation of the software.
In summary, the aims of this document are:
In Table 1 below, current configuration options and defaults are presented:
Proposal
In order to have uniform meaning and representation we use the following terms in our proposal:
Address: Complete address of a service. Aliases: bind, BIND
SCHEME://HOST:PORT
form: Specify scheme, host and portHOST:PORT
form: Specify host and port, use the default schemeHOST
form: Specify the host and use the default port and scheme:PORT
form: Specify the port and use the default host and schemeSCHEME://HOST
form: Specify scheme, host and use the default portSCHEME://:PORT
form: Specify scheme and port and use the default hostWe renamed Host which used to mean an address in the current configuration and use HTTP Address instead. In Table 2 below, we summarized the proposed configuration, showing added or changed configuration in bold.
$HOME/.pilosa
Priority of Configuration Sources
The configuration maybe specified in the command line, in an environment variable, in the configuration file or the default for that configuration is used. In order to be able to specify the configuration, the priority between these sources should be defined. Below is our proposed priority of sources, where higher level sources override the lower ones:
Command Line
One of the debates in the computing world is the number of dashes before a flag. BSD style utilities does not use any dashes and allows only single letter flags. GNU convention is to use double dashes (
--
) before the standard form of a flag and a single dash (-
) before the alternative (short) form. Some software uses minus (-
) to denote removal of a feature and plus (+
) to denote addition. Java, Go and Erlang uses single dash before flags with their command line tools. In this proposal we opted for the GNU style flags, based solely on the observation that the quantity of modern software using that convention vastly outweighs the single dash style, and developers using modern UNIX and UNIX-like OSs would have a certain taste for it. It should be noted that Go’s standard flag parser doesn’t differentiate between single and double dashes.In the light of the discussion above, always use lowercase flags with double dashes (
--
) to denote the long form and a single dash (-
) to denote the alternative form. Only the most used flags should have an alternative form. Use dash (-
) character as the word delimiter. Command line options should be preferably as short as possible without losing their meaning. Table 3 is below:--config
-c
--data-dir
-d
--bind
-b
--bind-PROTOCOL
--cluster.replicas
--cluster.hosts
HOST1:PORT1 HOST2:PORT2
--poll-interval
--messenger
--gossip
--plugin-dir
--plugins
PLUGIN1 PLUGIN2
--plugin-PLUGIN_NAME
--anti-entropy.interval
--profile.cpu
--profile.cpu-time
--log
Configuration File
The configuration file is in the TOML format. Use lowercase section and key names. Use dash (
-
) character as the word delimiter. Table 4 is below:Environment Variables
Most prominent deployment and orchestration tools such as, Puppet, Chef and Ansible also Docker support environment variables to pass configuration to a program. Moreover, environment variables are the preferred way of passing configuration for some application structuring conventions, like Twelve-Factor App
All environment variables are uppercase with underscore (
_
) used as the word delimiter. Some deployment tools (such as Puppet) seems to unable to set environment variables per process (only for the system). In order to avoid inadvertent configuration,PILOSA_
prefix must be used. Table 5 is below:Implementation
The
Config
structure inconfig.go
should be modified to match Table 2.(m *Main) ParseFlags(args []string)
method incmd/pilosa/main.go
should be moved toconfig.go
and become a method ofConfig
. New command line flags should be added to that method. A new methods which reads configuration from environment variables should be added. Ideally,Config
should have a method which reads from all configuration sources and applies the priorities mentioned in this document to the fields ofConfig
.