IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

New toolkit repository proposal: streamsx.protobuf #130

Closed bmwilli closed 6 years ago

bmwilli commented 6 years ago

I would like request a new toolkit repository called streamsx.protobuf

The purpose of this toolkit is to provide operators that can read in protobuf messages into tuples, and to write tuples as protobuf messages.

There is a utility: spl-schema-from-protobuf that will read the protocol format file (.proto) and produce SPL Types

There are two primary operators: ProtobufParse: given the .proto file, this operator takes in a blob and produces tuples that have a schema generated by the spl-schema-from-protobuf utility ProtobufBuild: given the .proto file, this operator takes in a tuple matching the schema produced by the spl-schema-from-protobuf utility and produces a blob containing the protocol buffer

There are two utility operators: ProtobufFileSource: read in a file of pseudo-standard protocol buffers ([4-byte lenth][protocolbuffer][4-byte length][protocolbuffer]...) and passes each protocol buffer out as a blob attribute on a tuple. ProtobufTCPSource: accepts tcp connection and data in the same format as ProtobufFileSource above

If there are any questions, please let me know.

Brian Williams

ddebrunner commented 6 years ago

+1 with caveat of need to see why specific source operators are needed. The goal has been to separate out parsing from sources.

bmwilli commented 6 years ago

They are not required, which is why I called them "utility" operators. Our customer needed them because the system they received data from followed the suggestion here: [https://developers.google.com/protocol-buffers/docs/techniques#streaming]

If there is an existing operator(s) that would provide this format input, then we could remove them and give a sample with the new approach.

ddebrunner commented 6 years ago

Yeah - the need for them or not can be resolved once the project is created.

mikespicer commented 6 years ago

+1

leongor commented 6 years ago

+1

вт, 6 февр. 2018 г. в 1:54, Mike Spicer notifications@github.com:

+1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IBMStreams/administration/issues/130#issuecomment-363261821, or mute the thread https://github.com/notifications/unsubscribe-auth/AGvlA4OJLSRIf24zCRajt4NnMhwoLAnuks5tR5TNgaJpZM4R3tak .

BruceGlassford commented 6 years ago

Reason for combining the operators - if you use TCPSource, in binary mode, you won't get a block until it's full since it doesn't know the data is ready. There's no parsed format that exists for the structure. If the data coming in is not continuous, splitting the parsing and sources may insert significant latency. Using a very small block size solves this, but increases network overhead drastically, impacting throughput.

bmwilli commented 6 years ago

How many +1 votes does it take to get it approved and created?

Thanks, Brian

petenicholls commented 6 years ago

There is no fixed number required....the hold up typically is whether I notice the voting has occurred. Will create repos tomorrow morning.

petenicholls commented 6 years ago

repos streamsx.protobuf set up with default license and readme, Brian William added as initial committer. Closing.