Closed arkgil closed 4 years ago
getters currently raise an error when there is no property set.
how about getters with a default value?
How from differs from user and server
I think 'from' is a jid while 'user' and 'server' are parts of it. Such distinction, with the same naming, is used in many places in the old code.
All fields should have explicit, documented and typespeced getters and setters.
Personally I'd prefer pattern matching. Having typespecs is good, but pattern matching is way more readable.
Now, to the point:
Part of the bloat stems from the fact that hook handlers return values by adding them to the accumulator, and there is no naming requirement, so it can be 'result', but can also be 'iq_result' or whatever else, provided the caller knows what it is going to be. Some of them may be reused later, some not.
The problem indeed is that acc api does not impose any logic, so you can store anything you want there. Which, quite understandably, prompted developers of global distribution to escape the mess by storing all their stuff under a separate key. In a while, mod_amp and mod_mam will be doing the same.
To summarise what the acc really stores: a) some hard-wired stuff which never changes or disappears, used mostly for debugging and profiling (ref, timestamp) b) the stanza which started the accumulator c) some values derived from the initial stanza, for easier access d) other attributes denoting the accumulator's origin (from and its derivatives) e) results of hook calls, usually as 'result', volatile f) cached results of some operations, e.g. privacy check g) cached info derived from initial stanza (e.g. iq_query_info) h) additional debugging info, collected along the way (hook_runs, handler_runs, send_result) i) arbitrary values which are stored in one place to be used somewhere else (global_distrib, and mod_amp will certainly do it)
Plus, some of those need to be passed to the other process (not stripped), those are 'persistent'. This is indeed quite a lot of use cases for a single entity, and designing an api to make it clean and consistent is a non-trivial task.
IMO if global_distrib is to be persistent it should be set as a persistent property, not added separately. This is the kind of bloat that current architecture seems to be causing.
return_type is a somewhat hackish way to stop infinite iq error loop. It is tested in a sense that there is a test which triggers the loop, and without the return_type safeguard mongoose would run out of memory and crash. Fixing acc caching logic will solve the deeper cause of the problem, then the return_type will not be needed anymore.
This one may be handy: picture
getters currently raise an error when there is no property set.
What I don't like about default values is that they don't have semantics. What does the default mean? Also I believe it would allow more relaxed typespecs, because dialyzer would see that any value can be returned by the getter.
I think 'from' is a jid while 'user' and 'server' are parts of it. Such distinction, with the same naming, is used in many places in the old code.
I don't think that's the case, user
and server
are sometimes set in places where from
is not. There is a difference here, but I'm not sure what it is.
Personally I'd prefer pattern matching. Having typespecs is good, but pattern matching is way more readable.
Current implementation of accumulators can be used to pattern match, but I haven't found a single piece of code which uses that in Mongoose. Additionally, allowing to pattern-match defeats the point of opaqueness of the type, in which case, I suppose, we will be using it as a map.
Re: 1. This sentence
it can be 'result', but can also be 'iq_result' or whatever else, provided the caller knows what it is going to be
and this one
Some of them may be reused later, some not.
are exactly the reasons why I'd like to refactor the accumulator. result
, or iq_result
have no meaning at all or meaning depending hugely on the context they're used in. In addition, I may call some hook, handler puts its result in the result
field, and then other handler may override it. Or other hook which is ran after the first one. I, as a caller, have no idea what might be inside.
I think that if hook handlers return a meaningful result, let's make it an explicit accumulator field. Or, let's scope results to hook names, so that different hooks don't override their results.
And as a final note, there are only 4 places in the whole codebase where the result is retrieved from the accumulator.
Re: 2. And that's exactly the point of this PR, to escape the mess 😄
Re: 3
a) But has it ever been used for those purposes?
That point is a really nice summarization 🙂
Re: 4
I'd remove the special persistent_properties
at all. In fact, only origin_jid
is put there, so why not add it to the list of not-stripped attributes.
Re: 5
Could you point me to the place where this error occures? Has it ever manifested itself? I guess simpler solution is to check if the element's name passed to jlib:make_error_reply
is "iq"
and that it has "error"
type. If I wouldn't know that function, I could pass fresh accumulator there every time and the said infinite loop of errors would occur anyway.
@bartekgorny that's a dead link.
Link fixed.
About pattern matching, I tried to say that I prefer get(attr, Container)
over get_attr(Container)
.
Has it ever manifested itself?
Oh yes, it killed some production instances of MongooseIM. But this is not the scope of this PR, let's discuss it somewhere else.
I just want to recap what is the higher-level purpose of these changes.
I think, that accumulators in their current shape give too much freedom and carry the level of uncertainty which makes some developers (myself included) very unconfident when working with them. Currently, any value can be put under any key, and that value can be overriden at any time.
With the new shape of the accumulator, every developer when adding new field would need to figure out 4 things:
Those 4 concerns would need to be expressed in code in the following ways:
Without generic get/2
and set/3
developers would need to at least define accessors. Hopefully, without documentation and typespecs the changes wouldn't pass PR review.
Thanks to this imposed discipline other developers (or the same developer a month later) would have confidence in what is inside the accumulator and whether they can assume that a property is set.
All of this means that the new API can't be generic. All functions need to have specific purpose.
Sounds reasonable. One minor addition:
- what is a purpose of that field
- what is a type of that field
- is that field immutable
- is that field always set, or maybe it appears only in some special cases
@bartekgorny we can't assume that accumulator is in the "c2s" context in the first place. It can be HTTP handler process, or a CLI process.
I prefer get(attr, Container) over get_attr(Container).
Why is that the case? It's the same number of characters 😄
I think we should allow for some non-predefined attrs, for the following reason: one of the nice things about MIM is it's hook/handler based architecture - you can add a custom module with its own hook handlers and register it in config, without modifying core code of MIM. Your handler may need to store something in the acc to be read by another handler. If we impose a strict requirement that every acc variable be predefined in mongoose_acc, we'd throw away this flexibility. Still, I'm all for setting up a dedicated place and maybe api for such custom storage.
What do you think of accessor naming used by the QT API? The idea is that the getters and setters are called the same and only differ by arity (in case of QT which is in C++ by signature, but close enough). I mean that instead of:
get_some_field(Struct) -> ...
set_some_field(Struct, NewFieldValue) -> ...
we could have
some_field(Struct) -> ...
some_field(Struct, NewFieldValue) -> ...
My impression is that the get_
/set_
prefixes carry little to no value, while having to type them is tedious of itself and also makes using autocompletion less convenient (having to type the prefix which is identical for a lot of functions before hitting the magic give me completions button). What do you think?
@erszcz Yes, I was thinking about it but I went with get/set route so that it's clearly visible what different functions do. I like the idea, though, and I'd go for it.
Few cents from me:
result
or iq_result
from different hooks.misc
field which is also a map. In open source code such a field should be empty in my opinion.same_field/1
for getting the value and some_field/2
for setting the value seems good to me.@michalwski the problem I see with misc
field is that everyone will use it instead of adding a dedicated field. This may sound brutal, but in my opinion the discipline here needs to be imposed from the very beginning.
@arkgil that's why I said that in open source Mongoose this should be empty and probably not used. I just want to make it as easy as possible for anyone using MongooseIM to add their own hook or handler when needed.
If they put sth custom in the misc
field then it will be easier for them to update their code to MongooseIM's master. If they have to modify core Mongoose code around accumulators then staying up-to-date with open source MongooseIM may be harder for them. That's my main concern.
@michalwski that makes sense, you're right. Although I still think that usage of this field will be abused :P
I have one bar that I thing a good accumulator implementation must pass: it works with global distrib and there is no mention of global distrib anywhere in mongoose_acc.erl
.
Related: module-specific fields like amp_check_result
, privacy_check
- should IMO also not be present in good accumulator implementation.
E.g. add a macro so that each mod can put/read things to/from accumulator inside its own namespace (?MODULE
?), and other code can explicitly access other namespaces if needed. But I'm of a really strong opinion that the accumulator design cannot both be good and tightly coupled to mods -- choose one.
I still think that usage of this field will be abused
That's what code reviewers are for - to prevent such abuse.
set_user_jid
is easier to grep (less noise).
set_user_jid
is easier to grep (less noise)
@arcusfelis Unless you import the function, you'd grep for mongooseim_acc:user_jid
, wouldn't you? If you import, you can still grep for import.mongooseim_acc
and building a regex alternative for the two is also an option.
One more point for discussion: we want some attrs to be immutable, others append-only, some read-write etc., there is going to be quite a few options. Who is going to decide how a given attr should behave, and how - is it going to be hardcoded in mongoose_acc, or do those attrs have separate apis, or maybe it is an option for the setter?
mongoose_acc:thisattr(Value, Acc) % thisattr is defined as immutable
or
mongoose_acc:thisattr(Value, Acc [immutable])
or something else?
My humble suggestion. Personally I'm for flexible API but a one that makes it harder to clutter acc. It means lack of flat namespace. :)
#{
%% No "global" ref for acc!
location => #{
timestamp => Timestamp,
module => Module,
function => Function,
line => Line
},
origin => #{
ref => Ref,
el => El, %% XML element that triggered the processing
iq_info => Info,
jid => Jid
% ... and some other cached values
},
stanza => #{
ref = Ref2 %% Updated every time when 'el' key is updated
%% Exactly (or almost) the same schema as origin
},
%% Has to be load tested
hooks => #{
history => [ {last_hook, result}, {previous_hook, result}, {two_hooks_ago, result} ]
},
%% Now time for per-module groups
%% Maintained by convention, not enforcing API
mod_global_distrib => #{
key => val,
strippable => false
},
mod_offline => #{
stored => true,
strippable => true
}
}
%% MACROS:
-define(new_acc(), mongoose_acc:new(#{module => ?MODULE, function => ?FUNCTION_NAME, line => ?LINE})).
-define(new_acc(Element), mongoose_acc:new(#{module => ?MODULE, function => ?FUNCTION_NAME, line => ?LINE, el => Element})).
%% API:
-spec update(Acc :: acc(),
Group :: term(), % atom?
KV :: map() | [{K :: term(), V :: term()}]) -> acc().
update(_Acc, location, KV) ->
error(immutable);
update(Acc, origin, KV) ->
error_if_any_key_exists_in_this_group;
update(Acc, Group, KV) ->
update.
update(Group, Key, Val) -> update(Group, [{Key, Val}]).
'get'(Acc, Group, Key)
'get'(Acc, Group, Key, Default)
%% Retain API for counters?
strip(Acc)
%% Further functions...
Discussion conclusions:
Action item:
Find out which fields should be mandatory.
Often in code we have proplists:get_value(key, List, default)
.
It's useful to have something similar for accs.
Superseded by acc 2.0
The intention of this PR is to introduce more "safe" interface to the accumulator module. While accumulators proved to be very useful when values need to be exchanged between multiple modules or for debugging purposes (e.g. registering hook calls and send results), I believe that they're underused due to lack of confidence when working with them - we never know if an accumulator field is there and what is its value.
My goal is to remove completely the "generic" part of the accumulator - no more
get
,put
and update. All fields should have explicit, documented and typespeced getters and setters.Changes so far
The code in this PR won't pass any tests - I have refactored maybe 1/3 of the whole accumulator usages. Changes so far include:
new/3,4,5,6
. This help immensely when debugging, when we're not sure if accumulator comes from HTTP interface, c2s, external component etc.element
attribute can be only set as a whole, i.e. the whole record, its name, type and attributes are always in sync. There is aset_element/2
function, but it should be removed and the stanza should be an immutable field, set only when creating an accumulator.from
andto
properties can be set only when creating an accumulator, they are immutable due to the lack of setters.iq_query_info
, as previously, is computed lazily and stored in the accumulator and is always in sync with correspondingexml:element()
(although that wouldn't be an issue if element was immutable)server
is a regular, mutable propertyConcerns so far
I'd really appreciate anyone's input on these topic.
{ok, ...} | error
. The downside is that we lose nice error message and instead will end up withbadmatch
es in cases when we're willing to crash when property is not there.has_
function for each property, but that would make the interface really bloated. The upside is that we retain a nice error message when the property is not there.element
should be completely immutable, it should be set to the stanza entering the node and initiating the "routing chain"server
anduser
attributes should be completely immutable as well. They should represent the user and server of the user who initiated the current routing chain (in other words, user and server of the current c2s process or HTTP handler or CLI process). Maybe we could name those properties differently?from
differs fromuser
andserver
?to
should be completely immutable as well.TODOs and questions for the future
Here is a list of things which I've found that are left to do. Please point out anything that you think is missing here:
Properties
result
property -subscription_lists
property - mutable, used byroster_get_subscription_lists
hook handlers.roster
property - filled by modules building roster. It should be mutable, getter should return empty list if the value hasn't been set yet.amp_check_result
property - also mutableprivacy_check
property - immutable per privacy check cache key, see https://github.com/esl/MongooseIM/blob/master/src/mongoose_privacy.erl#L65send_result
property - append-only list of records of sending the stanza. I'm not sure if anyone is using that, maybe we should remove it? It's used in https://github.com/esl/MongooseIM/blob/master/src/ejabberd_c2s.erl#L1769, but from what I see its value doesn't matter anyway. It's probably only useful for debugging.ref
property - immutable, set on creation. Again, I'm not sure if we need it, but AFAIK @fenek has used it in the pasttimestamp
property - immutable, set on creation. I don't know anyone who's used it, I guess we could remove it.global_distrib
property - regular, mutable propertyreturn_type
- used byjlib
in https://github.com/esl/MongooseIM/blob/master/src/jlib.erl#L150. To me it could be removed - based on thatjlib
decides whether IQ passed to the linked function is already an error - but one could pass unrelated accumulator and error IQ there, and the function won't return an error. I saw that these lines are covered by tests, but I believe it comes only from unit tests which call it directly.iq_result
property. I'm not sure if it has any meaningful usage. I see that it's set in a couple of places inmod_muc_light
but it's only retrieved inc2s
. Maybe someone know what it's created for.offline_messages
- append only list, should be removable because c2s clears themhook_runs
andhandler_runs
- record fired hooks and handlers, probably useful for debugging. They're not retrieved anywhere.Functions
strip
- we should figure out which elements are worth to keep persistent. Currently persistent properties are:ref
,timestamp
,from
,to
,global_distrib
andpersistent_properties
(of which at this point onlyorigin_jid
is used). Do we want to make this list smaller?But is it worth it?
As you can see, these modifications would bring huge changes to the code base. IMO it's worth it, I'd definitely fill more confident when using accumulators, but I'd like us all to decide.