GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
https://geonode.org/
Other
1.45k stars 1.13k forks source link

GNIP-49: GeoNode signals/notification refactor #2889

Closed afabiani closed 6 years ago

afabiani commented 7 years ago

Proposed by

Ariel Nunez (Terranodo) Alessio Fabiani (GeoSolutions)

Assigned to release

None yet.

GeoNode signals/notification refactor.

1. Background

GeoNode has been increasingly removing expensive operations from the request/response cycle. Before GeoNode 1.1, it used to do network calls to GeoServer in the HTML templates of the index page. Later on, most of the code involving communication with GeoServer was moved to signals that do not require the code to be written in the layers/model file but can be referenced in external apps. However, up until now, whenever a file is uploaded or a request to access a layer is sent, the end user keeps waiting for network calls (gsconfig or smtp) instead of having these processes happen out of band.

2. Proposal

In this GNIP, we propose to change the communication between apps to follow a topic approach, similar to how logging works, geonode.layers would send notifications to an exchange saying that a user uploaded a file, that someone wants access to a dataset, etc. Consumers would then subscribe to topics and act on messages that are sent, allowing for example for both a QGIS backend and a GeoServer backend to change their internal status (layer configuration) based on user actions with the GeoNode UI. Similarly, a layer edited via the Geoserver admin interface would be able to broadcast messages about that change and a consumer on the geonode side could rebuild the thumbnail or update the bounding box without a full updatelayers.

2.1 Functional perspective

As stated above, the main focus of this work package is to allow GeoNode to:

This is possible by implementing a mechanism which allows GeoNode to schedule actions which will run asynchronously, executing configurable operations.

The paradigm we are going to base our work on in order to implement such mechanism is a producer/consumer message passing method based on queues and deferred tasks. Message passing is a method which program components can use to communicate and exchange information. It can be implemented synchronously or asynchronously and can allow discrete processes to communicate without problems. Message passing is often implemented as an alternative to traditional databases for this type of usage because message queues often implement additional features, provide increased performance, and can reside completely in-memory.

A conceptual perspective of the infrastructure we are proposing is depicted in Fig. 1. Following the decoupled producer/consumer paradigm we intend to extend the GeoNode infrastructure (which comprises of GeoServer as well) in order to serialize and send audit messages about internal resources modifications to an external message broker which is responsible for the guaranteed asynchronous delivery of such messages to the registered consumers.

The Consumers will be responsible for collecting and properly processing incoming messages for various purposes; this section of the proposal mainly focuses on delivering notifications to end users in order to inform them about audit events generated in the system but additional consumers might be configured for different purposes like as an instance logging to a persistent storage such message for security reasons (being able to reconstruct who did what) or redirecting them into systems used to performing monitoring of the infrastructure; as a consequence the consumers should be pluggable as well as finely configurable.

In terms of delivering the notifications to end users we envision the need to implement at least consumers able to:

  1. Deliver emails to registered users containing the audit message.
  2. Create a custom RSS feed filtered on the base of the current logged user.

It is worth to point that the the message consumers devoted to send notifications to users will need to:

Aside from the two notifications mechanism mentioned above others might be implemented (e.g. Social Media integration) since as mentioned above we intend to create a pluggable and extensible API for the message consumers.

image1

2.2 About celery tasks

Initial work has been done on GeoNode to use celery to move off tasks from the main thread but a more comprehensive approach needs to be done by auditing the codebase for these operations and moving them to external tasks. However, celery tasks have a drawback, they imply the code that triggers the signal or the code that sends the notification needs to know what to do about it. i.e. which task to defer. As GeoNode gains more backends and monitoring frameworks, a new model that allows consumers to decide what to do with the notifications is a better architecture choice than the tight integration celery implies.

2.3 GeoServer as a notifications producer

While the Layer metadata is stored and managed by GeoNode, the geospatial resource lives into GeoServer. A geospatial resource can be imported through the GeoNode interface but it will served and managed by GeoServer together with some ancillary information like styling and so on, as such if we want to be able to send users notifications about geospatial resources, e.g. when a style has been changed on a certain layer or when a certain vector layers has been edited we must wire also GeoServer and allow it to send notifications. We should also mention the fact that sometime administrative actions could be taken directly on GeoServer (legitimately or not) which is something we should be able to track and notify administrative user about.

Fortunately GeoServer is mature application which was built with this use case in mind. It provide a plethora of extension point and listeners that can be implemented in order to collect information about the actions being taken either through the GUI or through the REST Interface. In the context of this proposal we intend to: Implement proper transaction listeners to launch notifications whenever someone is performing a WFS-T transaction on a certain layer. Implement proper catalog listener in order to launch notification whenever someone is making changes to the internal GeoServer configuration.

Currently there is no way from GeoNode to capture information about updates performed on a vector layer in terms of editing or on a GeoServer resource (style change, geospatial layer and publishing settings, deletion, …), therefore a user is not able to be notified on the real resource usage but only on changes made on its GeoNode representation. In the current core development we also envisage an improvement of the GeoNode notifications mechanism in order to allow a user to register also for resource updates done through GeoServer.

The proposal is to provide GeoServer with a transaction as well as catalog change listener plugin as part of the core development that can be used to post asynchronously information about such events to GeoNode Django which will submit them as Celery tasks into RabbitMQ to be handled asynchronously.

3. Technical Details

The implementation will rely on Kombu + RabbitMQ and implement a new Django management command to run the message listener and perform the backend tasks.

3.1 GeoNode producers

The signals will be updated so that instead of performing work, they only send out notifications to the message queue, making the request / response cycle as fast and simple and possible.

The code will be updated in the following locations:

accounts        emails      emails: invite, admin approve, etc
notifications   all         all notifications triggers an email depending on user preferences   
layers          signals     upload uses geoserver and notifications services
layers          signals     delete uses geoserver and notifications services    
maps            creation    uses geoserver and notification services    
documents       creation    uses notification services  
geoserver       updates     geonode should subscribe to this events
3.2 GeoNode consumers

The proposed implementation for the consumers is listeners running on an out of band management command, similar to the celery daemon that would receive the messages and pass them on to the appropriate function.

    def on_layer_viewer(self,body,message):
        logger.info("on_layer_viewer: RECEIVED MSG - body: %r" % (body,))
        viewer = body.get("viewer")
        owner_layer = body.get("owner_layer")
        layer_id = body.get ("layer_id")
        layer_view_counter(layer_id)
        send_email_owner_on_view(owner_layer,viewer,layer_id)
        message.ack()
        logger.info("on_layer_viewer: finished")

    def get_consumers(self, Consumer, channel):
        return [
            Consumer(queue_layer_viewers,
                     callbacks=[self.on_layer_viewer]),
        ]
3.3 GeoServer producers

It will be provided a GeoServer extension, pluggable and configurable, allowing it to monitor event thrown by resources and catalog modifications and publish them on external queue and/or message brokers.

The extension will be extensible enough to allow users to define both custom writers, responsible to create the message in a specific format through the use of templates, and custom publishers, responsible to publish messages on specific endpoints, which could be loggers as well as message brokers, databases or other systems.

The figure below, depicts architectural details of the GeoServer extension image3

GeoNode Specific Writers and Publishers Specifically for GeoNode, will be implemented both custom writer and publisher.

GeoServerNotificationGeoNodeMessageWriter Notifications messages will be read by GeoNode through the Kombu Python messaging library. Kombu, since 3.0, will only accept json/binary or text messages by default.

Example messages generated from GeoServer:

Adding a new vectorial feature to the Catalog

{
    "id":123e4567-e89b-12d3-a456-426655440000,
    "type":"CatalogAddEvent",
    "generator":"<GEOSERVER_ID>",
    "timestamp": 573828946728,
    "user": "admin",
    "originator": "<IP_OR_HOST>",
    "source": {
        "id":"FeatureTypeInfoImpl--570ae188:124761b8d78:-7fc1",
        "resource":"FeatureTypeInfo",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "title":"USA Population",
        "abstract":"This is some census data on the states."
    }
}

NOTE: notice that the property “generator” will contain the ID of the GeoServer instance. In GeoServer there is also the possibility to uniquely identify the single instance, even if part of a cluster, by using a specific extension allowing users to define its identifier. The message will also contain the HOST or IP of the GeoServer host in the “originator” property.

Adding a new vectorial layer to the Catalog

{
    "id":123e4567-e89b-12d3-a456-426655440001,
    "type":"CatalogAddEvent",
    "generator":"GeoServer",
    "timestamp": 573828946729,
    "user": "admin",
    "originator": "localhost",
    "source": {
        "id":"LayerInfoImpl--570ae188:124761b8d78:-7fc0",
        "resource":"LayerInfo",
        "type": "VECTOR",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "path":"/",
        "defaultStyle":"polygon"
        "styles": [
            {"style": "line"},
            {"style": "point"}
        ]
    }
}

Adding new features to a Resource

{
    "id":123e4567-e89b-12d3-a456-426655440006,
    "type":"PostUpdateEvent",
    "generator":"GeoServer",
    "timestamp": 573828946756,
    "user": "admin",
    "originator": "localhost",
    "source": {
        "id":"FeatureTypeInfoImpl--570ae188:124761b8d78:-7fc1",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "title":"USA Population",
        "abstract":"This is some census data on the states."
    },
    "totalInserted": 56    
}
3.4 GeoServerNotificationRabbitMQPublisher

A RabbitMQ Publisher is an extension able to connect and publish messages to RabbitMQ Message Broker using the RabbitMQ Client APIs.

4 Potential problems

simod commented 7 years ago

Alessio, very interesting proposal. Designed for scaling, thanks. Looking forward for it.

ingenieroariel commented 7 years ago

If there are no initial objections we will start work and update the GNIP as coding progresses.

waybarrios commented 7 years ago

I can identificate, we need to apply this implementation on the following signals and methods on GeoNode:

image

image

image

image

image

So, if you have any feedback or suggestions please let us know.

afabiani commented 7 years ago

+1

francbartoli commented 7 years ago

@waybarrios very useful change, but I would expect more huge testing in this PR

davisc commented 7 years ago

@afabiani and @ingenieroariel nice work. We've been working on a kafka-geoserver plugin which would be an interesting test case. https://github.com/boundlessgeo/kafka-geoserver-plugin

ingenieroariel commented 7 years ago

We also explored kafka in our initial analysis but found it may be harder for downstream projects to adopt and preferred to just use the existing RabbitMQ. Very interested to hear more about your experience with Kafka as I think it is a technically superior alternative (message durability, scalability, etc). How hard was it to update GeoNode installers to use Kafka?

afabiani commented 7 years ago

Hi all, the GeoServer plugin we developed and currently use on this implementation, has been published as a GeoServer community plugin and it's structure is general enough to easily allow plug other implementations. Notice that this plugin also handles catalog (read as configuration) changes, not only data.

Here there are the common classes:

https://github.com/geoserver/geoserver/tree/master/src/community/notification

https://github.com/geoserver/geoserver/tree/master/src/community/notification-common

and a specific implementation using AMQ topics

https://github.com/geoserver/geoserver/tree/master/src/community/notification-geonode

It would be cool to reuse the code you did for a Kafka implementation too.

Best Regards, Alessio Fabiani.

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Alessio Fabiani @alfa7691 github https://github.com/afabiani?tab=overview Founder/Technical Lead

GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 331 6233686

http://www.geo-solutions.it http://twitter.com/geosolutions_it


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


On Tue, Mar 14, 2017 at 5:57 PM, Ariel Núñez notifications@github.com wrote:

We also explored kafka in our initial analysis but found it may be harder for downstream projects to adopt and preferred to just use the existing RabbitMQ. Very interested to hear more about your experience with Kafka as I think it is a technically superior alternative (message durability, scalability, etc). How hard was it to update GeoNode installers to use Kafka?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GeoNode/geonode/issues/2889#issuecomment-286487657, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOARe5CPB1_jt-fHrYJ4tOs_fobRnd2ks5rlsb9gaJpZM4L2VMR .