Add Dynamic WriteData, Y-Import and RedisJson Composites?

IBMStreams / streamsx.plumbing

Plumbing operators manipulate the flow of tuples in a Streams application, but are not part of the logic of the application

http://ibmstreams.github.io/streamsx.plumbing

Apache License 2.0

1 stars 7 forks source link

Add Dynamic WriteData, Y-Import and RedisJson Composites? #39

Closed Alex-Cook4 closed 4 years ago

Alex-Cook4 commented 7 years ago

I have several composites that we have found very useful across a large deployment at one of our large customers. Would this be the right place for them? If not, any idea where? They are:

DynamicWriteData Filter - Ability to dynamically filter a stream based on the trace level of the operator.
Y-Import - Ability to either import data stream, or read from a file for testing purposes. Includes throttling to control flow, as well as a threadedPort with a queue to prevent slowing down data flows of jobs being connected to.
RedisJson Composite - A composite for exporting JSON to redis.

scotts commented 7 years ago

The first two - DynamicWriteData and Y-Import - sound appropriate. They sound like operators designed to managed the flow of tuples in an application.

But RedisJson sounds like something that is not "plumbing", as it's about pushing a particular kind of data to a particular kind of external storage. That may be more appropriate in streamsx.json.

Alex-Cook4 commented 7 years ago

That makes sense @scotts. Do the following namespaces sound good for the first two:

com.ibm.streamsx.plumbing.filters com.ibm.streamsx.plumbing.imports

ddebrunner commented 7 years ago

@Alex-Cook4 Since we want to encourage use of Publish/Subscribe from the topology toolkit maybe the Y-Import should be in that toolkit?

Could you expand a little on DynamicWriteData, its name doesn't match the description you provided, so I'm not exactly sure what it does.

ddebrunner commented 7 years ago

@Alex-Cook4 FYI - I also hacked up a Redis operator, one that wrote each tuple into Redis, using Jedis. and one that read from Redis using Jedis.

Alex-Cook4 commented 7 years ago

@ddebrunner that's fair :-) It is basically a Dynamic Filter that is turned on and off based on a trace-level argument. We are only using it to filter right before a filesink, but it could definitely be more generalized. "DynamicFilter" might be a better name, although I don't want to infringe on a more official DynamicFilter down the road.

That's cool that you used Jedis...what was the reason to do that over using the dps toolkit?

ddebrunner commented 7 years ago

I was using the Compose Redis service on Bluemix which DPS does not support.

ddebrunner commented 7 years ago

TraceLevelFilter ? Assuming that the filtering is driven by the SPL trace level.

Alex-Cook4 commented 7 years ago

Yes, I like that. +1

ddebrunner commented 7 years ago

If RedisJson uses DPS then is it specific to Redis or could it be used with any key-value store that DPS supports?

Sounds like a candidate for the DPS toolkit though.

Alex-Cook4 commented 7 years ago

@ddebrunner can you elaborate more on why you think the Y-Import would go in the topology toolkit? Isn't the focus of that toolkit Streams in other languages?

ddebrunner commented 7 years ago

Topology toolkit also provides the publish-subscribe model which is the easier to use approach to Import/Export. It seems that the optional import from a File should build upon Publish/Subscribe.

Though having thought about this, why do it that way at all (Y-Import), why not just Import/Subscribe and just have a microservice application that reads from a file and Export/Publish that stream. So that the application path being tested is the one that will be used at production?

Alex-Cook4 commented 7 years ago

That's a good point. The main reason comes from trying to develop a "standard import" at this customer, with the key focus being on making sure novice Streams developers import and run their data through a threadedPort operator that drops tuples to prevent a backlog in upstream jobs. We have also run into the problem where developers are changing their code to test.

Packaging it all into one piece that can be tested using a submission-time parameter serves our purpose from a simplicity perspective when trying to push adoption. I personally prefer the idea of the testing microservice and will think about it more for our customer situation.

dakshiagrawal commented 7 years ago

We have tried both approaches - (a) having a single Y composite operator; (b) having a microservice that reads from a file and exports which gets subsequently imported by the application.

In fact, (b) would be preferred approach since the production application does not have unnecessary file source, directly scan etc. For senior developers in the team, (b) worked perfectly well but then we had to abandon it due to proliferation of dependencies/applications which some developers just could not handle resulting in unnecessary questions/headaches for senior developers.

Suppose we have 30 applications running - approach (b) requires 30 of these testing services/applications which have tuple type dependencies across applications (file source application and rest of the application). Unless packaging or naming of these pairs of applications is consistent and their location in source repository is well understood, it tends to become a big mess.

Note that even if there is some way of passing some tuple type, all applications cannot share a common test application as test application may include some logic to clean data. Having this Y composite gives people a good head start as they can include logic to clean data in Y composite.

Alex-Cook4 commented 7 years ago

@ddebrunner Would you lean towards the Y-Import being a part of the streamsx.topology or here in the plumbing?

ddebrunner commented 7 years ago

If Y-Import was really Y-Subscribe then it should be in topology. If it's going to expose low-level Import primitives, then it probably makes sense in plumbing, though I'm concerned we will end up with two competing high-level schemes based around dynamic connections, thus potentially confusing developers about which approach to take.

I could probably provide better input if there was a more complete description (e.g. is it in a form yet where SPLDOC can be generated?), as @dakshiagrawal adds that clean functionality can be included in it, but not sure how that will be done. It seems like the composite is trying to solve several problems (back-pressure, testing, data cleansing, standard import, throttling) so it potentially crosses several toolkits (e.g. should it be part of a testing toolkit once that gets off the ground?) as well as solving some issues the product is addressing differently.

Alex-Cook4 commented 7 years ago

@ddebrunner, have you put your Jedis Redis operators anywhere on GitHub? I'm curious to take a look.