fission / fission-workflows

Workflows for Fission: Fast, reliable and lightweight function composition for serverless functions
Apache License 2.0
371 stars 42 forks source link

Improve NATS event store implementation #4

Open erwinvaneyk opened 7 years ago

erwinvaneyk commented 7 years ago

The currently implemented event store was written with a limited knowledge of NATS streaming, abusing some of the features.

Currently the naieve setup consists of the following subject structure

Issues with this setup:

Possible solutions:

  1. Move workflows out of the event store. As these are considered immutable when generated/parsed, they can be kept out of the event store and left to be handled by Fission.
    • Though it might be needed to store them somewhere persistent, as workflow invocations are tied to these parsed workflows. Information which might be lost if the parsed workflow is not stored.
  2. Use a single subject for all the things. I am not sure what the performance hit of this would be as, subscribers would need to go over all messages when recreating the state of anything.
  3. Use a subject per workflow. In this case the workflow is stored together with associated invocations. The problem here might be that you will not be able to delete (when that option becomes available) any subject as that would also
  4. Keep the current implementation and work with the NATS team to implement some of the missing functionality:
    • More advanced garbage collection; ability to mark subjects/messages as GC'able
    • Wildcard subscription support for NATS streaming
    • Ability to delete or archive subjects (or even messages) manually
  5. Switch to a different database or message bus. There is no perfect solution on the market yet that contains all the required properties of the event store (fast, lightweight, persistent, reliable, scalable). A partial implementation exists for BoltDB (dropped after realizing it would need implementation of the entire pubsub functionality) which might be an alternative.

Currently, for the prototype, this is a low priority issue, as for small usage (<1000 invocations) it works just fine. Nothing is persisted yet, as fission-nats.yml deployment is still using in-memory, and can be cleared by simply restarting that deployment. So, until the prototype is advanced enough that it becomes clear what is needed from the event store, the current implementation is okay.

saidimu commented 7 years ago

Curious if Kafka was ever considered. Seems like it would fit the bill.

5. Switch to a different database or message bus. There is no perfect solution on the market yet that contains all the required properties of the event store (fast, lightweight, persistent, reliable, scalable).
erwinvaneyk commented 7 years ago

@saidimu it is indeed one of the options we looked at. However, Kafka is on the other side spectrum, containing a lot of overhead and features that are not needed for a simple, internal data store. For the main requirements of it needing to be fast and lightweight NATS seems preferable to Kafka. We are working with the NATS team to resolve some of the issues (in their codebase or this one)

That said, this is one of the parts that still needs to be improved a lot. The interface used to communicate with the data store is deliberately as simple as possible to allow for easy implementation of another data store. So, if preferred, it could be an option to add a Kafka backend as well.