WallarooLabs / wally

Distributed Stream Processing
https://www.wallaroolabs.com
Apache License 2.0
1.48k stars 68 forks source link

Constants and Wallaroo - How should constants be handled? #2254

Open nisanharamati opened 6 years ago

nisanharamati commented 6 years ago

Issue:

The discussion section below is kept for context This issue is intended to discuss how state that is static should be handled in Wallaroo.

A canonical example is a constant that you want to be able to access in a stateless computation. e.g. a lookup table. On the one hand, it is definitely a "State", because it is data that is part of the application specification that the event stream interacts with. But it is also not like any current Wallaroo state, because it is

You can currently do this in Python by using a globally scoped object:

from job import read_db
# globally visible db
DB = {}

def application_setup(args):
    # macher state setup
    global DB
    DB = read_db('my_db_file')
    ...

@wallaroo.computation(name="computation")
def computation(data):
    if data.member in DB:
        return data
    return None

but this can lead to users starting to treat this globally scoped object as mutable, which will lead to consistency issues (this object isn't managed by Wallaroo, isn't necessarily consistent across the application workers, etc.) and data races.

So this raises the question: How do we support a pattern of allowing user defined computations to interact with state that is static, that is perhaps similar in behaviour to a global constant in a regular language environment?

Are there specific properties it should definitely have, and not have?

aturley commented 6 years ago

For right now my preference would be to provide the user with a good model of when it would be safe to initialize a value like this and then let the user handle things from there using whatever mechanism they think is best. This should work fine for Python and Go. It starts to fall apart when you get to a language like Pony that doesn't have a concept of a global variable, but we don't officially support the Pony API for now so I think we can wait on this.

jtfmumm commented 6 years ago

I agree with Andy here since as far as I understand the Python API, even if we exposed a mechanism for using global constants, the user can still use global variables however they like. For this reason, it seems more important that we explicitly describe when this could work and when it could cause problems, even if we discourage it in general.

nisanharamati commented 6 years ago

Discussion result: