Open oscgonfer opened 4 months ago
Re. the forwarding field, and on the actual implementation of what's needed.
The redis
queue data-received
(used for push
notifications) should contain everything needed for making the forwarding available at platform level.
On a previous take, I was considering bridging the mqtt
broker, but this approach is flawed, as we are missing so much information from the device (even the id
). This would make us have to create another connection to the database, or keep all the information needed for the forwarding in a separate location (and update it), which doesn't make any sense.
In this sense, I think it makes more sense to think of a small service that can subscribe to the redis
queue and send whatever has allow_forwarding = true
to a queue of workers for transforming the payloads in case it's necessary. Those workers can then forward that data to another queue (a results store) and a task can subscribe to that for actual forwarding. The image below shows the new blocks in yellow
In smartcitizen-flows
I have a couple of very dirty prototypes of how those conversions would work for the projects we currently have. There is also a prototype of how those workers can be implemented with celery
, given that all scdata
is written in python
and it's based on that. If we think this part well, we can maybe make it so that those workers also process
the data of incoming.
Just adding to the conversation. Notice that the suggested approach favours existing solutions and tries to reduce complexity. There might be other, more standard solutions, yet they will imply more significant architectural changes. See #265
See source
telnet_queue
data_received
for other services to consume the data. Currently only used by the push
service, that uses Socket.IO to openly stream (public) data over WebSocket at i.e. ws.smartcitizen.me
.The push
service already provides a general forwarding method when the forwarded part doesn't expect a customized message. Replacing the push
service and even the data_received
queue with the mqtt
service could increase reliability (trusted broker stability) and maintenance (less code to maintain).
The broker supports MQTT (and MQTT over WebSocket).
The broker AAA features could manage access to each receiving platform. Credentials can be manually granted or programmatically over the broker REST API. New features introduced for device AAA will also benefit data consumers / forwarded platforms using this method.
Forwarded services unable to subscribe to the broker could use the Data Bridge feature in EMQX. However, each forwarded service must be manually configured or parametrized using the broker REST API.
The broker will also become a data provider, and the push
service will be deprecated (ending ws.smartcitizen.me
).
As @oscgonfer described in the previous task, Custom Forwarding involves some kind of message transformation to match each platform's expected standards.
Multiple forwarding services data_received
could remain an internal queue to which multiple forwarding services could subscribe. The Redis configuration for data_received
to support various subscribers needs to be assessed. Services could be orchestrated as separate containers like push
. That considers forwarding not only implies customizing the message but also the connection to each platform.
Built-in forwarding services: Assuming all forwarded services connect to our platform the same way, we could take the General forwarding approach to delivering messages through the mqtt
service, yet we extend the Storer
class with multiple methods to publish data to the broker based on the requirements of various data receivers. Further on, we could add an initializer class that parametrizes the broker using the REST API to set receivers AAA and/or Data Bridge options in the future.
Combined Both options, including General Forwarding, could be combined or co-exist with minor adjustments.
After talking to @oscgonfer sharing some ideas on a future Forward Class
A Forward instance defines how data from each registered device can be forwarded to a specific platform after user consent.
An instance should perform the following functionalities:
Create a boolean property within each Device to enable users to allow data to be forwarded to the Forward platform
(Optionally) Run custom code whenever a Device is created or updated to communicate with third-party platforms, register the new Device, or obtain information.
(Optionally) Store custom information and other relevant metadata (last time forwarded) per Device (one-to-one)
(Optionally) Run custom code every time a Device publishes (see Storer) to create a custom payload containing the latest data in a form compliant with the receiving platform
(Optionally) Parametrize the MQTT broker with specific topics, such as authorization and authentication or broker bridges, that are particular to how the receiving platform connects.
Publish the custom or standard payload to the MQTT-specific topics created before every time data is received.
(Maybe) Allow custom code to forward the payload to the platform instead of the MQTT option. i.e., the receiving platform expects data over the GraphQL interface.
Notice: That implementation doesn't provide a mechanism to recover data that has not been forwarded in real-time. However, the broker via session can ensure data is delivered once a forwarded platform reconnects. Nevertheless, that makes me think again that looking at Kafka is an option.
Some additional comments to complete with my notes:
Create a boolean property within each Device to enable users to allow data to be forwarded to the Forward platform.
This is the data_policy>allow_forwarding (bool)
field at the top of this issue. User should make this choice, although in some cases, they may be forcibly opting in by participating in the project in question (i.e. CitiObs).
(Optionally) Run custom code whenever a Device is created or updated to communicate with third-party platforms, register the new Device, or obtain information. (Optionally) Run custom code every time a Device publishes (see Storer) to create a custom payload containing the latest data in a form compliant with the receiving platform
As said, both these would be very interesting if we implement them with some sort of schema
that can refer to this code (short of a lazy callable) when needed. The code in these schemas can convert from our serialization of choice (what would be the equivalent to current data-received
) into something else, both for creation of entities elsewhere, or data forwarding.
(Optionally) Store custom information and other relevant metadata (last time forwarded) per Device (one-to-one)
We can use (and already do) the postprocessing>forwarding_params
field for that, BUT, I would suggest to rethink if those are columns in device, or separate tables (currently the latter) for performance. We can improve indexing on this too.
Notice: That implementation doesn't provide a mechanism to recover data that has not been forwarded in real-time. However, the broker via session can ensure data is delivered once a forwarded platform reconnects. Nevertheless, that makes me think again that looking at Kafka is an option.
This is key. However, in our kafka
sprint we hit the paywalls quite quickly. Maybe we should now take it on again. If we do not go for something like kafka, I am not sure how much we should rely on the message broker for the persistence on the "way out" of our platform (which I agree is perfect for "the way in"). At the compromise of not having a "fully available" system, I would say that if data is not available "real-time", there is always a way to request it via REST API.
A diagram to show the updated design after our conversation (@pral2a please correct if something wrong):
Note: with all this, we would remove push
service too (and the redis
channels for data-received
and token-received
too (TBC)).
Notes: Any user can have token/username secrets, but we only forward the devices that have forwarding authorized
@pral2a @timcowlishaw conclusions from today's session:
ws
Redis in the storer.rb
push
service and remove it from compose.yml
device_policy
as described below:device:
data_policy:
private: bool (False by default)
precise_location: bool (False by default)
allow_forwarding: bool (False by default)
HTTP
posts as well for the processing of dataschema
batch forward
messages upon batch
processingpostgres
to a job?More on location accuracy and number of digits: https://gis.stackexchange.com/questions/8650/measuring-accuracy-of-latitude-and-longitude
Notes on "precise_location" policy:
New devices will be registed as "imprecise", existing ones all migrated to "precise" and a note put in the forum about the new option
We truncate all existing latlongs to 5dp.
Rathern than truncating from 5dp to 3dp for "imprecise location" devices we should add gaussian noise in the last 2 dp
This happens on write so that we don´t have the precise locations either (even though we know which are precise and which note)
the data_policy itself is only shown for the users themselves and admins
Addressed in #331
Our database is licensed under https://opendatacommons.org/licenses/odbl/
However, we have only one way for the user to control their data by the
is_private
field. We would like to improve that by adding some additional fields:precise_location
:bool
whether or not the location is to be stored with X number of digits precision or notallow_forwarding
:bool
for users to allow data to be forwarded or not to other platforms (data aggregators)allow_forwarding
This is linked to https://github.com/fablabbcn/smartcitizen-api/issues/226, and the idea behind it is that there are various projects in which we are involved in that require having
devices
data in other servers (servers from other projects). Some of these projects have devices that Fab Lab Barcelona handles, but some other have devices that anyone (i.e. any user) handles, for whom we should not take the decision of actively forwarding their data. In this sense, I define actively as Fab Lab BCN sending device data directly elsewhere, instead of a 3rd party requesting data openly via the API.For the particular case of "us forwarding data": the
allow_forwarding
would potentially require us to list those 3rd parties and informing what those 3rd parties would do with the data. To avoid issues, we could specify that we only actively forward data that is marked asforwardeable
to specific data aggregators.precise_location
Regarding location, we currently keep high resolution location, as defined by the user by selecting it on the map. We do not actively update that location, nor we have any other means to determine it besides the user selection on the map. However, it would be a good policy to avoid issues by not storing high resolution location data at all, and by default, storing 3-4 digits only. However, this comes in a bit late, and we may affect how people are using it now. Although not ideal, we could have an opt-in field which would keep high resolution location. For all existing devices, this could be opted-in and users could choose to loose that precision by opting out. For new devices, we could have it low-precision by default, and users could opt-in.
The
device
serialisation could read:This is a very interesting issue, which can be discussed lengthy. I would really appreciate some contribution and other thoughts, @pral2a and @timcowlishaw