Race condition in some metrics (counter and gauge are fine)

etrepum commented 12 years ago

These metrics are ok:

folsom_metrics_counter (update_counter is atomic)
folsom_metrics_gauge (insert is atomic)

These metrics can drop data due to race conditions (interleaved get_value / insert):

folsom_metrics_histogram
folsom_metrics_meter
folsom_metrics_meter_reader

And this one can grow and shrink incorrectly (interleaved ets:info(…, size) / delete / insert):

folsom_metrics_history

There are some similar problems with folsom_sample_exdec, folsom_sample_none, folsom_sample_uniform but those problems would go away if folsom_metrics_histogram usage was serialized by key.

The most straightforward way out of this predicament is to set up a process per metric (of these types) for update serialization. The metrics are basically determined at compile time and their count should be smallish (hundreds maybe), so a fixed size pool isn't necessary. Most reads can still go directly to the table since it's updated atomically, but some of the histogram implementations may need some kind of get_values serialization.

folsom_sample_exdec: can delete a table before you get its contents, maybe ets:delete_all_objects/1 would be preferable to cycling the table. This would change worst case to returning somewhere between 0 and Size samples instead of crashing. Inserting a list instead of calling insert in a loop would change it such that 0 or Size samples could be returned, since inserting a list is atomic.
folsom_sample_none: could be in a state where there are Size - 1 items in the pool (this is likely acceptable)
folsom_sample_uniform: this looks like it would always be in a consistent state, so definitely acceptable.

joewilliams commented 12 years ago

Yup, I am aware of these issues and have started work to resolve them. Here is one option for exdec samples:

https://github.com/boundary/folsom/tree/exdec_gen_server

I haven't merged this yet as I am not convinced this is the best way to go yet.

etrepum commented 12 years ago

If you're open to dependencies, a straightforward solution would be to use gproc. This would make it super easy to register worker processes with the name of the metric that they own and do the pub/sub.

Not sure that making exdec a process is the best solution, although it looks like it might solve the problem in that particular place.

etrepum commented 12 years ago

Also, which of the notify interfaces in folsom_metrics / folsom_ets are actually used? folsom_metrics:notify/1 is the only one that's documented.

joewilliams commented 12 years ago

Right, I was looking at that last week when I was writing the exdec_gen_server branch. Seemed like a decent way to go.

joewilliams commented 12 years ago

folsom_metrics is the API module I prefer folks use.

etrepum commented 12 years ago

What I mean is that there are notify/1, notify/2, notify/3, and notify_existing_metric/3.

notify/1 and notify/2 are essentially the same thing, and notify_existing_metric/3 has similar semantics. The comments lead me to believe that notify/2 is actually preferred, but the documentation only shows notify/1. notify_existing_metric/3 seems reasonable to have, but it's not clear if it's actually worth the trouble. notify/2 could surely be made faster by essentially passing down the result of get_info(Name) to notify/4 and not calling handler_exists at all.

notify/3 would be a good thing to get rid of if you could, or at least clean up all the implementation bloat it causes. I could see more use cases for it if metrics were named by terms and not atoms.

etrepum commented 12 years ago

So here's what I was thinking last night, I started sketching it out last night but didn't finish it.

Have a function that maps types to a way to create metrics from them. Sometimes the state will be meaningless, sometimes it will be a pid. If it was a behaviour I'd have it look like this:

-module(folsom_metric_constructor).
%% The returned module here is a folsom_metric_handler
-callback new(Name :: atom(), Opts :: [{atom(), term()}]) -> {Handler :: module(), State :: term()}.

Define a behaviour to encapsulate what a metric must do:

-module(folsom_metric_handler).

-callback delete(Name :: atom(), State :: term()) -> ok.
-callback get_value(Name :: atom(), State :: term()) -> term().
-callback get_value(Name :: atom(), Count :: integer(), State :: term()) -> term().
-callback notify(Name :: atom(), Value :: term(), State :: term()) -> term().

folsom_ets should change its #metric{} to include the state and the handler module returned by the constructor. An example module might look like this:

-module(folsom_metrics_counter).
-behaviour(folsom_metric_handler).
-export([new/2,
         notify/3,
         delete/2,
         get_value/2]).

-include("folsom.hrl").

new(Name, []) ->
    {?MODULE, ets:insert(?COUNTER_TABLE, {Name, 0})}.

notify(Name, Value, _State) when is_integer(Value) ->
    inc(Name, Value);
notify(Name, Value, _State) ->
    case Value of
        inc ->
            inc(Name, 1);
        {inc, Value} ->
            inc(Name, Value);
        dec ->
            inc(Name, -1);
        {dec, Value} ->
            inc(Name, -Value)
    end.

delete(Name, _State) ->
    ets:delete(?COUNTER_TABLE, Name).

get_value(Name, _State) ->
    [{_, Value}] = ets:lookup(?COUNTER_TABLE, Name),
    Value.

get_value(Name, _Count, _State) ->
    get_value(Name, _State).

inc(Name, Value) ->
    ets:update_counter(?COUNTER_TABLE, Name, Value).

There would be a wrapper module that constructs a metric in its own gen_server process, this would itself be a metric that has State as its pid. Metrics that it wraps would have an init/1 to return an initial {ok, State}, and notify casts would return the new state for the gen_server (will often be faster than using ets since we are already serialized).

-module(folsom_metric_proc).
-behaviour(gen_server).

%% API
-export([start_link/0, new/2]).

-export([new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-record(state, {name :: atom(),
                handler :: module(),
                state :: term()}).

new(Name, Opts) ->
    {ok, Pid} = gen_server:start(?MODULE, [{name, Name} | Opts]),
    {?MODULE, Pid}.

notify(Name, Value, Pid) ->
    gen_server:cast(Pid, {notify, Name, Value}).

delete(Name, Pid) ->
    gen_server:call(Pid, {delete, Name}).

get_value(Name, Pid) ->
    gen_server:call(Pid, {get_value, Name}).

get_value(Name, Count, Pid) ->
    gen_server:call(Pid, {get_value, Name, Count}).

init(Args) ->
    {_, Name} = lists:keyfind(name, 1, Args),
    {_, Handler} = lists:keyfind(handler, 1, Args),
    {_, Opts} = lists:keyfind(opts, 1, Args),
    {ok, S} = Handler:init(Opts),
    {ok, #state{name=Name, handler=Handler, state=S}}.

handle_call({delete, Name}, _From
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:delete(Name, S),
    {stop, normal, Reply, S#state{state=undefined}};
handle_call({get_history_values, Name, Count}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_history_values(Name, Count, S),
    {reply, Reply, State};
handle_call({get_value, Name}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_value(Name, S),
    {reply, Reply, State}.

handle_cast({notify, Name, Event},
            State=#state{handler=Handler, state=S}) ->
    S1 = Handler:notify(Name, Event, S),
    {noreply, State#state{state=S1}}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

An example that uses this might look something like this:

-module(folsom_metrics_histogram).
-behaviour(folsom_metric_handler).
-export([init/2,
         new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

-include("folsom.hrl").

new(Name, Opts) ->
    folsom_metric_proc:new(Name, [{handler, ?MODULE}, {opts, Opts}]).

init(Opts) ->
    {Type, Size, Alpha} = lists:foldl(
        fun opt_fold/2,
        {?DEFAULT_SAMPLE_TYPE, ?DEFAULT_SIZE, ?DEFAULT_ALPHA},
        Opts),
    Sample = folsom_sample:new(Type, Size, Alpha),
    {ok, #histogram{type=Type, sample=Sample}}.

notify(_Name, Value, Hist=#histogram{type=Type, sample=Sample}) ->
    Hist#histogram{sample=folsom_sample:update(Type, Sample, Value)}.

delete(_Name, _State) ->
    ok.

% pulls the sample out of the record gotten from ets
get_value(_Name, #histogram{type=Type, sample=Sample}) ->
    folsom_sample:get_values(Type, Sample).

get_value(Name, _Count, Hist) ->
    get_value(Name, Hist).

%% Internal API

opt_fold({type, Type}, {_Type, Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({size, Size}, {Type, _Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({alpha, Alpha}, {Type, Size, _Alpha}) ->
    {Type, Size, Alpha}.

joewilliams commented 12 years ago

This seems like a pretty reasonable solution. I assume we also want a metric supervisor to keep an eye on each new metric gen_server that we start up (supervisor:start_child/2).

etrepum commented 12 years ago

Yeah, a simple_one_for_one or something like that

joewilliams commented 8 years ago

Test message, ignore me.

joewilliams commented 8 years ago

Folsom has moved, please resubmit your issue at https://github.com/folsom-project Thanks!

boundary / folsom

Race condition in some metrics (counter and gauge are fine) #20