[scevent]: allow scevent to clear its cache via messaging system

luca-s commented 6 years ago

This is useful to allow external applications writing external origins to the database and associating as additional origin to the seiscomp3 solution of the same event, to properly work in scolv.

Due to the events caching a new origin in the database is not seen by scevent and this prevent few operations to be properbly completed in scolv:

setting an external origin as preferred
moving an external origin to another event

gempa-jabe commented 6 years ago

Hi @luca-s, lets start the discussion as I need to understand the issue of this PR.

Due to the events caching a new origin in the database is not seen by scevent

Therefore we have the messaging to let scevent know about new/updated/deleted origins.

Do you say that you write directly into the database and expect scevent to associate that origin with an event if scolv requests it to do so, bypassing the messaging? That should actually work as well. The origin that should be set as preferred or moved will be searched in the database if it is not part of the local cache already. Can you elaborate a bit more on what you try to achieve and what does not work?

Thank you.

luca-s commented 6 years ago

Thank you for taking the time of reviewing this PR @gempa-jabe .

Do you say that you write directly into the database and expect scevent to associate that origin with an event if scolv requests it to do so, bypassing the messaging?

Exactly. In our seiscomp setup, we use also 3rd party software to write content to the seiscomp database, without using spread or scmaster. So, e.g. after a seismic event, we have the external application writing an external origin to the database, and associating as additional origin to the seiscomp3 solution of the same event.

If, minutes later, we open scolv, we can see the origin correctly associated to the event, and all attributes (as far as visible) as they should be.

However, setting this external origin as preferred origin may fail, with the following error message:

EvPrefOrgIDFailed(:unreferenced:) from scevent@sc3a

The error is due to this line.

The general problem is that if an even is already cached, scevent will not reload that event from the database and not even any origin associated with it, unless the event expires. The cache doesn't know if an event has been update on the DB or if its origins changed. At least this is what I see debugging the code.

Now there is the matter of deciding if it is correct for an external application to write to the database without using the messaging system, I don't know if this is allowed. Nevertheless I believe that this simple patch shouldn't be too cumbersome for scevent and for this reason I hope it can be accepted.

Thank you for your help.

gempa-jabe commented 6 years ago

Thank you for the explanation. So apart from the external origin that you write into the database, you also add an origin reference to the event by writing that directly to the database. I would advise against it. Your code change is then more of a hack to allow scevent to reload the information from the database. And this is not the desired workflow of SC3. We do not poll the database for changes but rely on the messaging. In your case I would advise to send a message to scevent to associate the external origin with a certain event. This can be done rather easily by sending a journal entry with action EvGrabOrg or by sending a notifier with the new OriginReference. Isn't that an option?

Even if we can make this work for this particular case with your code, it will not work for many other cases, e.g. updating picks or amplitudes directly in the database and expect SC3 to grab this new information and start processing.

luca-s commented 6 years ago

Sorry for the late reply and thank you for your comments. I generally agree with your comments of using the messaging system but when I spoke to the people involved in the 3rd party software that writes content to the seiscomp database I discovered they have good reasons to write directly to the database. So I understand why we should use the messaging system but I also know that we can't do that.

So my view is that even if the way we want to use this PR can be seen as a hack, we can still look at the PR from seiscomp3 perspective only, where it can be seen as an additional configuration option, not a hack but a well defined feature. So I wonder if this can be considered for merging anyway (if it is not good enough I can obviously change it), unless you can already see that it is not a good idea to have the ability to configure the cache expiration time.

gempa-jabe commented 6 years ago

The configuration of the cache time is fine with me. The only thing that I don't like very much is the clean-up in the timer. That suggests being good practice to enable such kind of workflows. It furthermore might break playbacks because usually the cache will be updated when new information is available and not periodically. Offline playbacks without a database might stop working as expected. I haven't checked that yet.

The messaging is a fundamental concept of SC3. I still don't understand why the people that populate the database are not able to launch a script to send a message afterwards, even just a command message for scevent to invalidate its cache. If you need to integrate two system, shouldn't the requirements of either party taken into account? The requirement for SC3 to notify about updates is sending a message. Having arbitrarily set cache times and interval times doesn't make the whole thing deterministic. What if we change the timer interval later to 60s? Wouldn't that break with your assumption of a cache cleanup each second? I still treat that periodic clean-up feature as a hack, not as a feature.

Regarding TimeSpanBuffer::removeExpiredEntries(): The entries in the cache are not expired. The cache keeps X seconds of objects for a particular time span. The reference time is the update time of the most recent object in the cache. Maybe TimeSpanBuffer::feedNull() or even allow to feed a NULL pointer is good option. The latter would always need to pop one entry from the RingBuffer.

luca-s commented 6 years ago

Thank you for taking the time of reviewing this PR. I am closing it for now. I might come back with another idea.

luca-s commented 6 years ago

@gempa-jabe Regarding your suggestion on sending "command message for scevent to invalidate its cache", I believe that would work fine together with the cache expiration configuration (and I would remove the clean-up from the timer).

My idea is that scevent would be able to send the special cache invalidation message (via command line option) in the system, then all the active scevents would listen to the same message type and would invalidate their cache (implemented per your suggestion as TimeSpanBuffer::feed(NULL) ). Does that sound reasonable to you?

gempa-jabe commented 6 years ago

Does that sound reasonable to you?

Yes. Reason is that you can then rely on the time of cache invalidation. Also a response message should be sent with information that the operation is completed.

TODO:

[ ] Implement feed(NULL) in TimeSpanBuffer and RingBuffer
[ ] Add handling a command message to scevent to trigger _cache.feed(NULL)
[ ] Reply with "command completed" message

Do you need my support for any of the above items?

luca-s commented 6 years ago

Thanks @gempa-jabe, I believe I can do that. Anyway, if you could point me to a module that uses a custom message that would make my life easier ;)

And what about the idea of adding a command line option to scevent to generate the "clean cache" message? Or is there a better way to send the message without creating ad hoc code?

gempa-jabe commented 6 years ago

Anyway, if you could point me to a module that uses a custom message that would make my life easier ;)

https://github.com/SeisComP3/seiscomp3/blob/master/src/trunk/libs/seiscomp3/datamodel/messages.h

This can be implemented privately within scevent that only scevent instances know about that message type. We could also create a generic ClearCache message but then we need to find a way to specify which tool is targeted. Currently we do not support peer-to-peer messages. Since the cache control is very application specific, I would opt for a private scevent message type.

Handling the message is then done inside handleMessage.

And what about the idea of adding a command line option to scevent to generate the "clean cache" message?

Yes, that sounds good.

luca-s commented 6 years ago

@gempa-jabe I created the first stub of the code and I would like to ask your help to finish it up. I created the ClearCacheRequestMessage and ClearCacheResponseMessage classes. eventtool looks for ClearCacheRequestMessage inside handleMessage and if it finds the message the cache is cleared and a ClearCacheResponseMessage is sent. Also there is a new command line option clear-cache used to send a ClearCacheRequestMessage.

Currently the code doesn't work because I don't know what group I should send the messages to. Do you have any suggestion? Also I don't know how I can wait for a reply (ClearCacheResponseMessage ) after sending the ClearCacheRequestMessage.

Also I thought that I don't actually need to implement feed(NULL) in TimeSpanBuffer and RingBuffer because the method clear already does what I need. For this reason I also removed the cache expiration time configuration.

(I noticed the wrong indentation, I will fix that too).

gempa-jabe commented 6 years ago

Currently the code doesn't work because I don't know what group I should send the messages to. Do you have any suggestion?

The primary group of scevent is the EVENT group. What do you mean with "doesn't work"? Doesn't it send a message or doesn't it receive a message? In the default configuration, scevent sends to EVENT and subscribes to EVENT so it should work.

Also I don't know how I can wait for a reply

In your message handler, as usual. Adding a flag to tell the event handler to only react to the response message is the probably the easiest option. Once the message has been received and the flag is set, call quit().

luca-s commented 6 years ago

The primary group of scevent is the EVENT group. What do you mean with "doesn't work"? Doesn't it send a message or doesn't it receive a message? In the default configuration, scevent sends to EVENT and subscribes to EVENT so it should work.

I meant that I didn't try the code as it is just a stub. I wanted to double check with you if I was on the right track in using the primary group. I will finish it up now, many thanks.

In your message handler, as usual. Adding a flag to tell the event handler to only react to the response message is the probably the easiest option. Once the message has been received and the flag is set, call quit().

Great I will do so. I was hoping there was a special way to respond to a message, but it doesn't seem so.

One last question: when should I use NetworkMessage instead of Seiscomp::Core::Message? I saw that the connection object supports both types.

gempa-jabe commented 6 years ago

I was hoping there was a special way to respond to a message, but it doesn't seem so.

You don't want to respond to a message but wait for a response. ;) And that is application logic. If you don't call Application::run then the thread that reads the messages and dispatches them will not run. Then you can poll in your run method:

bool EventTool::run() {
    if ( waitForResponse ) {
        while ( true ) {
            NetworkMessage *nmsg = NULL;
            Message *msg = connection()->readMessage(true, Connection::READ_ALL, &nmsg, &error);
            if ( msg == NULL )
               break;
            if ( ClearCacheResponse::Cast(msg) )
              return true;
        }

        return false;
    }
    else {
        return Application::run();
    }
}

when should I use NetworkMessage instead of Seiscomp::Core::Message?

NetworkMessage is the binary blob along with some metadata that will be sent over the wire. The Core::Message is the application layer message object that is the deserialized representation of a NetworkMessage. Use Core::Message.

luca-s commented 6 years ago

It works like a charm! Many many thanks. If there are no issues on your side, for me it is ready.

gempa-jabe commented 6 years ago

I am glad to hear that. Is the PublicObjectCache change still necessary? If not, lets remove it from this PR.

luca-s commented 6 years ago

Removed.

gempa-jabe commented 6 years ago

:+1: Looks good to me. Can you prepend [scevent] to your commit message? What does ENH mean?

luca-s commented 6 years ago

All done!

P.s. the ENH comes from other projects I work on that use the convention:

Standard acronyms to start the commit message with are: API: an (incompatible) API change BENCH: changes to the benchmark suite BLD: change related to building numpy BUG: bug fix DEP: deprecate something, or remove a deprecated object DEV: development tool or utility DOC: documentation ENH: enhancement MAINT: maintenance commit (refactoring, typos, etc.) REV: revert an earlier commit STY: style fix (whitespace, PEP8) TST: addition or modification of tests REL: related to releasing numpy

gempa-jabe commented 6 years ago

Merged

SeisComP3 / seiscomp3

[scevent]: allow scevent to clear its cache via messaging system #173