esl / MongooseIM

MongooseIM is Erlang Solutions' robust, scalable and efficient XMPP server, aimed at large installations. Specifically designed for enterprise purposes, it is fault-tolerant and can utilise the resources of multiple clustered machines.
Other
1.64k stars 422 forks source link

Host-indenependent modules disappear when hosts config is changed #1679

Closed bartekgorny closed 5 years ago

bartekgorny commented 6 years ago

MongooseIM version: master Installed from: any Erlang/OTP version: any

While most modules are loaded, configured and used per host, there are some which do not depend on host - exemplum: mod_admin_extra. It is activated whenever a host is started, but subsequent activations are idempotent. However, when a host is stopped, the module is deactivated. Result: if you remove a host from "hosts" config and reload config the module's functionality is gone. Which is not what one would expect.

There are at least two ways to solve the issue: (1) using "refcount", (2) creating a separate configuration for host-independent, or "global", modules.

bartekgorny commented 6 years ago

The drawback of (2) is that when configuring MongooseIM you'd have to know which modules are global and which are per-host.

arcusfelis commented 6 years ago

third way: we can check, if there are other modules using the module, by looking into ejabberd_modules ETS table. forth way: to use a list of hosts, instead of "refcount". Until the list is empty, we don't stop anything global.

arcusfelis commented 6 years ago

another way: we have gen_mod start and stop callbacks. We can introduce new callbacks, start_global/2, stop_global/2, which are refcounted. I.e. start_global would not be called second time for second host. stop_global would only be called, once no host uses the module.

chrzaszcz commented 6 years ago

A piece of functionality is a MongooseIM module if it can be turned on for a particular host. If not, there is no reason for it to be a module. As mod_admin_extra cannot be started per host, I would suggest starting it globally - the suitable place seems to be the ejabberd_app:start/2 function.

Of course it is a good idea to redesign the growing list of startup and cleanup procedures in ejabberd_app and, as some of them are optional, design a new abstraction similar to MongooseIM modules with global start and stop functions - both @bartekgorny and @arcusfelis mentioned this and it looks like a viable long term solution.

fenek commented 6 years ago

I agree with @arcusfelis, new optional callbacks (start | stop)_global/1 ( I don't think /2 is needed here :) ) called only for refcount equal 0 are probably the cleanest solution. Every module could provide both global and per-host constructors and destructors whenever necessary.

I'm leaning towards maintaining a full list of hosts that have started the module but simple counter sounds good as well. We could perhaps reuse information in ejabberd_modules table, as it is already using {Module, Host} keys.

@bartekgorny @erszcz Thoughts?

chrzaszcz commented 6 years ago

@fenek this way it would be still possible to configure such a 'global' module per host (eg. in host_config) even though it would be working for all hosts. We are using a very similar approach in Motorola and for me it's rather a bodge than a solution.

fenek commented 6 years ago

In case of purely global modules it still makes sense, because such module won't have any per-host init and in this case per-host start_module will be more of indication like "This host needs functionality from this module" rather than "I need something specific initialised by this module for this host".

bartekgorny commented 6 years ago

So, it seems we are heading towards the following solution:

gen_mod has four callbacks instead of two:

start/2
stop/2
start_global/1
stop_global/1

while the last two are optional (for backward compatibility). Start and stop are called for every host, while start_global and stop_global are called only once. gen_mod:start_module first checks if the module is already started for any host and if not then runs start_global, gen_mod:stop_module does the reverse. Checking module usage is done in ejabberd_modules ets table, since this is where we do our bookkeeping.

erszcz commented 6 years ago

Why does the module even need to know about being global or not if it's completely managed by gen_mod then? In other words, how does my_mod:start_global(Opts) differ from my_mod:start(<<"fake-global-vhost-name">>, Opts) given gen_mod would always use the same <<"fake-global-vhost-name">> when it should?

In yet another words, what does extending the behaviour interface buy us?

bartekgorny commented 6 years ago

Because some modules have initiation or cleanup procedures which have to be called once, no matter how many hosts they serve. Like mod_admin_extra, for instance.

erszcz commented 6 years ago

gen_mod:start_module first checks if the module is already started for any host and if not then runs start_global

If gen_mod:start_module does the check (what I described as "completely managed by gen_mod" above), then why can't it just run my_mod:start/2 with a placeholder host? I simply see some redundancy here, which I'm not sure is necessary.

chrzaszcz commented 6 years ago

I think that having the option to put a global module likemod_admin_extra in the host_config tuple is just bad. This should not be possible and ejabberd_config should scream at you for doing so. This could work with what @erszcz suggested if there is a way (e.g. callback) in a module that would tell ejabberd_config, gen_mod etc. if it is global or not.

bartekgorny commented 6 years ago
  1. @chrzaszcz I don't see why it would be a bad idea. The administrator of a mongoose instance does not have to know or care whether a certain module is initialised globally or per host. Also, I could imagine a module which needs both types of initialisation, once globally and then per each host.
  2. I think @erszcz idea was rather like:
    
    -module(some_stupid_module).
    -behaviour(gen_mod).

start(global, Options) -> (some code here); start(Host, Options) -> (and here).


instead of doing `start/2` and `start_global/1` callbacks. I'm not sure about backward compatibility, though. 
chrzaszcz commented 6 years ago

@bartekgorny If a global-only module is configured inside a host_honfig tuple, it should be enabled only for the particular host. So I think it would be counter-intuitive if it is implicitly enabled for all hosts.

erszcz commented 6 years ago

Finally, we're getting more concrete :+1:

I have the impression we're not clear on the assumptions taken. Let's specify that, so that there's less ambiguity in the discussion: who should decide whether a module is initialised globally or per vhost?

I see three options here - please comment if you see more or other options:

  1. The module author(s), that is the module itself.
  2. The system administrator.
  3. The module supports both start types.

1. The module itself decides on global/per-vhost init

This variant follows @bartekgorny's statement:

The administrator of a mongoose instance does not have to know or care whether a certain module is initialised globally or per host.

Why not, then, use refcounting in just the modules which should be started/stopped once? It's the minimal solution that does the job. Then we could have types of modules - a globally initialised module:

-module(mod_that_knows_it_is_global).
-behaviour(gen_mod).

start(Host, Opts) ->
    %% This mod knows it only needs to be initialised once, no matter how many vhosts require it.
    case mongoose_refcount:acquire(?MODULE) of
        1 -> %% init
        _ -> ok
    end.

stop(Opts) ->
    case mongoose_refcount:release(?MODULE) of
        0 -> %% clean up
        _ -> ok
    end.

The second type would be a per-vhost module:

-module(mod_that_knows_it_is_per_vhost).
-behaviour(gen_mod).

start(Host, Opts) ->
    %% Init separately for each Host.
    ...

stop(Opts) ->
    %% The same for cleaning up.
    ...

Pros:

Cons:

2. The system administrator decides on global/per-vhost init

The administrator knows what he or she wants to do and writes it down in blood in the config file. This means that:

a) A module defined in the global modules clause is started globally. b) A module started in host_config is started per vhost. c) Due to (b) a module which has to be started for 2 or more vhosts has to be listed multiple times in the relevant host_config clauses.

In this approach it makes sense to introduce start_global and stop_global callbacks as each module actually has to support being configured as global (start_global and stop_global are used) or per-vhost (start and stop are used):

-module(mod_something).
-behaviour(gen_mod).

start(Host, Opts) ->
    %% Init separately for each Host.
    ...

stop(Opts) ->
    %% Clean up for host.
    ...

start_global(Opts) ->
    %% Init globally / shared for all vhosts.
    ...

stop_global(Opts) ->
    %% Clean up globally / for all vhosts.
    ...

I imagine the following flows of information:

However, this approach requires all modules to support the extended gen_mod behaviour interface. For some modules it might not make (much) sense to start them globally (basically modules storing persistent data) or might not be trivial to implement the correct resource sharing and isolation between vhosts if their runtime structure is not multiplied per vhost.

Moreover, this would require extending all modules to support the new gen_mod interface in one go, not incrementally as need arises.

Pros:

Cons:

3. The module supports both start types

In this approach the administrator has control over how modules are started, but not every module has to support global and per-vhost init model.

-module(mod_that_supports_global_and_per_vhost_startup).
-behaviour(gen_mod).

start(Host, Opts) ->
    case proplists:get_value(init_mode, Opts, vhost) of
        global ->
            %% Global init - for example use refcounting and ignore Host.
            case mongoose_refcount:acquire(?MODULE) of
                1 -> %% init
                _ -> ok
            end;
        vhost ->
            %% Per Host init.
            ...
    end.

stop(Opts) ->
    case proplists:get_value(init_mode, Opts, vhost) of
        global ->
            %% Global cleanup
            case mongoose_refcount:release(?MODULE) of
                1 -> %% cleanup
                _ -> ok
            end;
        vhost ->
            %% Actually, it would be nice to have Host available here, the way it's available in start/2.
            %% Oh, well...
            ...
    end.

Pros:

Cons:


I tried to summarise the ideas and bits of information already shared in the thread, but I might've misinterpreted or malformed the intentions - if so, please correct or specify the above.

My preferred solution is point 3.

michalwski commented 6 years ago

Here are few cents from me.

Based on the comprehensive summary from @erszcz my preferred solution would be 1. This is the least flexible. On the other hand, currently, I'm not able to come up with a module which needs to support both types. I might have missed sth.

From another perspective, I'm wondering how module's config would be accessible from other places then it's start function when options list is passed as an argument. In some cases, when we need to read sth from module's config, for example, in a hook handler we usually use gen_mod:get_module_opt/4. This function can take either global or Host as one of the parameters. If we know that a module is always global we know that calling get_modlue_opt with global will (probably) work. If the module is able to work in both modes, how do we decide if we need to read host specific module's option or global specific? Different handlers for these modes could be a solution to this problem.

A side not about the get_module_opt with global value as the first argument. Currently it will return value from the config only if all host-specific values for the option are the same. In other case it will return the Default value passed as the last argument.

arkgil commented 6 years ago

The administrator of a mongoose instance does not have to know or care whether a certain module is initialised globally or per host.

I strongly disagree. Configuring global module in a per-host basis is an invalid configuration, and it shouldn't be allowed. There are many topics much harder to grasp about MongooseIM config which administrators need to know. The indication whether module is global or not should be clearly stated in the documentation.

michalwski commented 6 years ago

After heaving an offline discussion we came to a conclusion that the following approach would be the clearest.

  1. If module needs to be global it'd be better to configure it in new config option, let's say {plugins, [mod_admin_extra]}
  2. It implements other behaviour with f.i start_global and stop_global functions.

We may start naming such modules plugins because they are not really modules comparing to current modules.

In other words, this is more or less point no 1 from @erszcz summary. The main difference is the place where such "plugins" are configured.

In cases where a module is host specific but needs some resources started once, these resources starting / stopping could be moved to the plugin and used from modules.

bartekgorny commented 6 years ago

If we go this way, maybe we could rename those from mod* to plugin*?

erszcz commented 6 years ago

In cases where a module is host specific but needs some resources started once, these resources starting / stopping could be moved to the plugin and used from modules.

This potentially introduces dependencies between the new plugins and existing modules. Since we have support for dependency resolution, we should not forget about supporting this new kind of dependency.

bartekgorny commented 6 years ago

I have nothing specific to say, just want to keep this conversation going until we reach a decision, one way or another :)

fenek commented 6 years ago

Of all options presented here, I think the best one is the strong distinction between global and per-host modules. However, I'm not so sure about plugin name. It's not very intuitive IMO. I'm pretty sure most people will need to check the docs to know the difference. I know it's not a big issue but I'm wondering if we could come up with alternative name, like globmod_. Or even perhaps a rather radical change to prefixes globmod_ and hostmod_ (with backward compatibility retained until 4.0)?

As far as I understand, in this case we won't need any kind of refcounting, correct?

chrzaszcz commented 6 years ago

After some local discussions and considering names like service I think that one of the biggest challenges is still the good name ;-) There was also some concern regarding motivation for the changes. I think that we could look for more candidates for global modules here: https://github.com/esl/MongooseIM/blob/master/src/ejabberd_app.erl#L48-L73

Some of these 'services' have a startup procedure that is performed only if a special option is present in ejabberd.cfg. Thus, I think that such 'services' could be configured in a more unified way. I also agree with @erszcz that we could add a mechanism for dependencies between modules and such 'services' (however we call them in the end...).

One advantage of this solution is that all we need at the beginning is a logic for starting them and a behaviour - we could incrementally add new ones and refactor existing ones without breaking any backwards compatibility.

I think that we could start with making a list of existing 'services' that could use the new behaviour (they can be either modules or 'services' started in ejabberd.app) and see if there is enough motivation for the change and if ejabberd.cfg would be more readable afterwards. The cost of such evaluation should be quite low.

fenek commented 6 years ago

service sounds way better than plugin and me like it. :) I'll try to add such evaluation to the current sprint, unless @bartekgorny wants to do it?

bartekgorny commented 6 years ago

There are certain reasons why I should do it... so yes, I'll take it.

bartekgorny commented 6 years ago

https://redmine.erlang-solutions.com/projects/mongooseim/wiki/MEP_services

michalwski commented 5 years ago

Based on the issues the service functionality was created in #1792 and #1794. I'm closing this issue. Please re-open it or better create new one if you know that sth is still missing.