IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

Propsal: Split Internet toolkit into base and extended toolkit #113

Closed joergboe closed 6 years ago

joergboe commented 7 years ago

Proposal

As it has been internally discussed the Internet toolkit should be split into a base toolkit and an extended toolkit. The goal is to separate all operators with an integrated web-server function from all other operators and functions.

The internet base toolkit

shall have all classical (sink and source) operators without the web-server functionality. Name: streamsx.inet
namespaces: com.ibm.streamsx.inet / com.ibm.streamsx.inet.http / com.ibm.streamsx.inet.ftp Artifacts:

The extended internet toolkit

shall include all operators with the embedded web-server: Name: streamsx.inetext namespaces: com.ibm.streamsx.inetext.rest / com.ibm.streamsx.inetext.wsserver Operators:

mikespicer commented 7 years ago

Why? You have described the mechanics/result but I'd like to understand the reasoning for doing this.

ddebrunner commented 7 years ago

-1 on any changing of namespaces for the rest operators. These operators are widely used and we should not be forcing application changes.

ddebrunner commented 7 years ago

Why would HTTPBlobInjection not be with the rest operators, since it is a rest operator?

joergboe commented 7 years ago

Yes HTTPBlobInjection should go into the extended internet toolkit.

joergboe commented 7 years ago

Not changing the namespace of the extended toolkit has the disadvantage that the extended toolkit has no match of the short name (streamsx.inetex) ant the namespace.

chanskw commented 7 years ago

-1 on splitting the toolkit this way. My concern is that the concept of "extended" toolkit is not intuitive. How would people know what operators are part of the extended toolkit and which are base and how do we make that differentiation?

ddebrunner commented 7 years ago

If we do split (and no reason has been given yet) then why even make the new toolkit an "extended" version of inet? Just use streamsx.rest and com.ibm.streamsx.rest.

joergboe commented 7 years ago

And what is about the wsserver operators?

joergboe commented 7 years ago

The reasons are discussed here: <Samantha has removed internal IBM link from comment>

ddebrunner commented 7 years ago

We need the reasons here since this is an open source project.

chanskw commented 7 years ago

@joergboe Please do not post internal IBM links to open-source project and therefore I have removed the internal link from your comment.

I agree with Dan. This is an open-source project and we need a reason why this needs to be done for this open-source project so the community can vote on this proposal.

ddebrunner commented 7 years ago

And what is about the wsserver operators?

Some suitable name for the toolkit then, com.ibm.streamsx.jetty ? Can decide on a name if/when any split is approved.

joergboe commented 7 years ago

The Internet toolkit has grown over a longer time and currently it contains operators with a lot of different functionality. In the past the maintenance becomes difficult. Something does need to be done to rationalize the toolkit. The strategy is to have smaller toolkits, so it produces smaller bundles.

So I propose to split the toolkit.

joergboe commented 7 years ago

For this split I see 2 options:

  1. Split into 2 toolkits:
    • One (base) toolkit with classical (sink and source) operators based on internet client functions.
    • One (extended) toolkit with more sophisticated operators which may include a web-server function. The detailed split is explained in my initial post.
  2. A functional split into:
    • http - contains http functions and operators
    • ftp - contains operators for ftp functions
    • view and inject - contains http view and inject operators (including json and xml)
    • websocket - contains operators and functions for websocket support.
joergboe commented 7 years ago

I personally feel that the option 2 has some downsides:

ddebrunner commented 7 years ago

In the past the maintenance becomes difficult.

I think this was actually down to a decision to partially split/subset the toolkit, rather than the fact the toolkit has a number of operators in it. This caused multiple branches and hence dual maintenance.

joergboe commented 7 years ago

Agreed. I want to overcome the subsetting of an toolkit.

ejpring commented 7 years ago

@joergboe I read your summary of the earlier discussion in the #streamstoolkits channel, but I really don't understand why you think its necessary to split this toolkit at all. Why not just pull the server-side operators into the product version of the toolkit? They're all very useful.

joergboe commented 7 years ago

Proposal

The current internet toolkit contains client side and server side operators and functions. To have a clear distinction of this parts we should split this toolkit. This means we split the current internet toolkit into 2 toolkits. These are:

The internet client toolkit

shall have all classical (sink and source) operators based on client functionality and functions.

Repository name

streamsx.inet

namespaces

com.ibm.streamsx.inet / com.ibm.streamsx.inet.http / com.ibm.streamsx.inet.ftp

Artifacts

InetSource operator HTTPGetStream operator HTTPPost operator HTTPGetJSONContent operator HTTPGetXMLContent operator new operator once it is ready HTTPRequest http functions (httpGet, httpPutt, httpPost, httpDelete, urlEncode, urlDecode) FTPOperators(FTPReader, FTPCommand, FTPPutFile) associated samples

The internet server toolkit

shall include all operators based on web server functionality having an embedded web-server and related artifacts. To have only a minimum of changes in the existing code base, the current namespaces are used.

Repository name

streamsx.inetserver

namespaces

com.ibm.streamsx.inet.rest / com.ibm.streamsx.inet.wsserver

Artifacts

WebContext operator HTTPTupleInjection operator HTTPTupleView operator HTTPJSONInjection operator HTTPXMLInjection operator HTTPXMLView operator new operator HTTPBlobInjection WebSocketInject operator WebSocketSend function obfuscate associated samples

Vote

Please +1 on this proposal if you agree Please -1 if you have any concern.

ejpring commented 7 years ago

-1

Most of the disruption this split causes will be outside of SPL source files, and using the same namespace in both toolkits won't minimize that, but it will break the naming convention for toolkit namespaces and repositories. That will increase the disruption now and cause more confusion going forward.

I think you should follow the naming convention and use 'com.ibm.streamsx.inetserver' as the namespace in the 'streamsx.inetserver' toolkit.

In your documentation for this split, please describe changes needed in SPL 'use' statements, application build procedures, and 'git' tooling.

chanskw commented 7 years ago

I am also not sure how this will work. Are you proposing that we simply move the operators from client toolkit to the server toolkit? Or would there be a deprecation path?

Usually, when we move operators around, we have to deprecate existing operators and put the new operators in the new toolkit. If that is the case, then your client toolkit will end up with the following:

namespaces

com.ibm.streamsx.inet / com.ibm.streamsx.inet.http / com.ibm.streamsx.inet.ftp

Artifacts

InetSource operator HTTPGetStream operator HTTPPost operator HTTPGetJSONContent operator HTTPGetXMLContent operator new operator once it is ready HTTPRequest http functions (httpGet, httpPutt, httpPost, httpDelete, urlEncode, urlDecode) FTPOperators(FTPReader, FTPCommand, FTPPutFile) associated samples

PLUS Server side of things, all deprecated:

WebContext operator HTTPTupleInjection operator HTTPTupleView operator HTTPJSONInjection operator HTTPXMLInjection operator HTTPXMLView operator new operator HTTPBlobInjection WebSocketInject operator WebSocketSend function obfuscate associated samples

Then in the new server toolkit, you will have the server operators in the the following namespace: com.ibm.streamsx.inet.rest / com.ibm.streamsx.inet.wsserver

The impact of this is as follows:

Your other option is to simply move the operators without deprecating them. However, this breaks API compatibility, without a nice migration path. We should not break API compatibility without a really good reason.

ddebrunner commented 7 years ago

When you ship the client side of the toolkit into the Streams product, you will ship the client operators and a set of deprecated server operators (as the first release of this toolkit into the product)

I don't think that would be done. It would make no sense for the product to include deprecated operators it had never shipped.

if customers want to use both client and server toolkits at the same time, it will not be possible. The operators will have namespace conflict.

I think that's only the case if they want to use the open source version of the com.ibm.streamsx.inet (client) toolkit that has the deprecated operators. If they want no conflicts they can use either:

I think what would make sense is that the the open source client toolkit:

An interesting idea was to keep the server toolkit in the streamsx.inet repository - no need to create a new repo. It's just a repo that produces two related toolkits (+ sample toolkits).

ddebrunner commented 7 years ago

I think we are still missing how do we avoid this situation in the future.

How do we add new operators to a toolkit that might not be long term viable, but useful in the short term, and make them easily available to customers, i.e. through a release of the toolkit.

chanskw commented 7 years ago

@ddebrunner +1 on your release plan.

I also prefer that we do NOT create a new streamsx.inetserver repository. Instead, both server and client toolkits stay in the streamsx.inet repository. Any release of streamsx.inet will contain both client and server toolkits. By separating out the server toolkit, it gives us a bit more flexibility in packaging. But because both toolkits are in streamsx.inet, I believe it will create less disruption and confusion.

With this approach, I suggest the following two toolkits: com.ibm.streamsx.inet com.ibm.streamsx.inet.server

This may make it less confusing for the server operators to have com.ibm.streamsx.inet.rest and com.ibm.streamsx.wsserver as their namespaces.

But if we are already breaking compatibility, then it may make more sense to just make the namespace match the toolkit since customers have to migrate anyway.

As for a more long term solution about how to avoid this problem in the future, I believe there are a couple of options:

1) We provide guidelines and mechanism to mark toolkits, functions, operators as experimental in place. This let customers know that the functions are experimental, may have limitations and APIs can change in the future.

2) I am starting to wonder about our single master branch development process, and whether that is causing some of these problems. In the health toolkit, we have employed GitFlow branching model: http://nvie.com/posts/a-successful-git-branching-model/

This cheatsheet explains the process: https://datasift.github.io/gitflow/GitFlowForGitHub.html

In this model, there are two branches: develop and master. Develop is a branch where we work on our features, allow for feature integration, and things may not be too stable at the time. Master is always kept stable, and only things that are ready will be merged into Master. With this model, we are able to integrate new features into the develop branch without disturbing master. People can download the develop branch and try things out. Develop branch is periodically merged into the master and releases are done off the master branch. If we have functions that are not ready for a release from master branch yet, we can cherry pick them off when making a release.

3) Another alternative is to put new functions into a new toolkit in the same repository. When the functions are matured, they get merged back into the client / server toolkit. I am not sure I am a fan of this, but that's an option.

ejpring commented 7 years ago

@chanskw @ddebrunner I like your suggestions. Keeping both toolkits in the same repository avoids some of the disruption to customer build procedures. Putting the server-side operators in the 'com.ibm.streamsx.inet.server' and 'com.ibm.streams.inet.wsserver' namespaces is less confusing than the 'com.ibm.streamsx.inetserver' namespace.

This will, however, make the build procedure for the product a bit more complicated, especially for the sample applications, since it will need another control file to exclude the non-product files and directories. I assume that's not a big deal.

As I understand it, the multi-branch development process is what 'git' was designed for. It would be fine with me, but I'm not sure whether customers are ready for that level of complexity. The illustrations in the articles you linked will help customers up that learning curve, so please do include those links in your documentation, if you decide to go this way.

joergboe commented 7 years ago

2 toolkits into one repository is technical possible (we need 2 different toolkit directories) but there is no advantage to do so. It produces more confusion and it produces a couple of questions (how to proceed with the common build script? do the toolkits use a common branch or different branches how is versioning and the tagging of the branches commit hashes are not longer biunique for a toolkit

ejpring commented 7 years ago

@chanskw About your alternative 3 ... If I remember correctly, the original idea for the "x" in "streamsx.whatever" and "com.ibm.streamsx.whatever" was exactly what you suggest -- to indicate experimental stuff, with the possibility of moving into the product as "streams.whatever" and "com.ibm.streams.whatever" in the future. Unfortunately, I think its too late to go that way now.

ejpring commented 7 years ago

@joergboe Not sure I understand your objection. The 'streamsx.inet' repository has eleven toolkits in it now, including those in the 'samples' and 'tests' directories. Adding more toolkits to the repository does not seem more confusing to me. I think your questions are all good ones, and I think we're working towards answering all of them in this thread of discussion, aren't we?

joergboe commented 7 years ago

@ddebrunner If the namespace and the names of the server operators are not changed (and this was my latest proposal) we need no deprecation path. What should be the message in that deprecation warning? 'The operator com.ibm.streamsx.inet.rest.HTTPTupleView is deprecated use operator com.ibm.streamsx.inet.rest.HTTPTupleView instead'

In this case the user has to change the toolkit path and that all.

joergboe commented 7 years ago

@ejpring I can see only one toolkit directory with a toolkit information model file and a samples directory and a test directory

ejpring commented 7 years ago

@joergboe Yes. As you know from our work on the network toolkit, I think samples and test applications are just as important for customers as operators and functions, so I urge you to consider them all together.

chanskw commented 7 years ago

@joergboe the streamsx.health repository has over 10 toolkits in it. Each of the toolkit has its own build script and can be build independently. In addition, there is a top level build script that will build the entire repository.

As for versioning, each toolkit has its own version. They are updated based on what's changed in the toolkit independently. The streamsx.health repository has a top level version that encompasses all of the toolkits. Basically, we treat streamsx.heath as a project and it has many toolkits in it that can evolve independently.

When we consider this, we need to consider what's best for our customers. The reason why I proposed to have two toolkits in streamsx.inet is that customer can find both sets of the operator from a single repository. If we split it into separate repositories, customers has to find the server toolkit in a new repository, which can be confusing.

chanskw commented 7 years ago

@ejpring as for the product build, I think we can easily exclude the server toolkit if that's what we choose. Furthermore, we can put the server samples in a separate samples directory. We have done this with other toolkits. Some of the samples that are shipped in the product are in the samples directory. Others are only available on Github and are put in a different directory.

ejpring commented 7 years ago

@chanskw Oh! I had not looked at the streamsx.health repository before -- I like the way you have structured both the operators and sample applications, and I appreciate the READMEs at each level. Nice work!

I think we should go this way with the inet toolkit.

ddebrunner commented 7 years ago

What should be the message in that deprecation warning? 'The operator com.ibm.streamsx.inet.rest.HTTPTupleView is deprecated use operator com.ibm.streamsx.inet.rest.HTTPTupleView instead'

It's probably more at the level of the namespace, e.g:

Use of com.ibm.streamsx.inet.rest functionality is deprecated in this toolkit. The namespace has been moved into the com.ibm.streamsx.inet.server toolkit and will be removed from this toolkit in version 3.0"

ddebrunner commented 7 years ago

In this case the user has to change the toolkit path and that all.

They have to change the dependencies in their applications/toolkits as well.

joergboe commented 7 years ago

If the the product build has to exclude the server toolkit from the common repository, the split makes no sense. The product build can exclude parts of the toolkit code anyway.

ejpring commented 7 years ago

@joergboe Good point.

chanskw commented 7 years ago

@joergboe Not exactly. If you exclude some of the operators from a toolkit, it's a different version for the toolkit and makes things more complicated. If you just exclude the entire toolkit, the client toolkit that is shipped in the product has exactly the same content as what's released on Github... and therefore has the same version.

Furthermore, it's easier to exclude an entire toolkit from packaging, then trying to cherry pick some operators out of a toolkit.

ddebrunner commented 7 years ago

@chanskw @joergboe

Right, picking operators/functions out of a toolkit may not be easy because items like the SPLDOC main contain references to those operators in toolkit/namespace overview pages.

joergboe commented 7 years ago

@chanskw This are exactly my arguments for a split. This was the motivation for me to start this initiative with the internal discussion 4 weeks ago. Parts of the team raised permanently objections against the split or specifically against my proposed split. I can not discuss the split for another 4 weeks. So the cherry picking is only way to go.

joergboe commented 7 years ago

SPLDOC is generated from the code. If the operators and the namespaces are not included in the code, the SPLDOC will not contain references to not existing objects.

ejpring commented 7 years ago

@joergboe Not sure what you mean by cherry-picking. If I understand @chanskw correctly, she suggests splitting the operators and functions between two toolkits named 'com.ibm.streamsx.inet' and 'com.ibm.streamsx.inet.server', continuing the practice of putting each sample application in a separate toolkit, and storing all of the toolkits together in the existing GitHub 'streamsx.inet' repository. The product build would then select the 'com.ibm.streamsx.inet' toolkit and client-side sample application toolkits from the GitHub repository. Is that what you mean?

In any case, @ddebrunner is right -- SPLDOC for operators and functions in the 'com.ibm.streamsx.inet' toolkit should not reference anything in the 'com.ibm.streamsx.inet.server' toolkit, regardless of what GitHub repository its stored in.

Similarly, the client-side sample applications should not use any of the server-side operators or functions.

chanskw commented 7 years ago

Thanks @ejpring That's exactly what I mean.

chanskw commented 7 years ago

To summarize our discussions and try to move things forward. I think there are three things we need to vote on,

1) Split inet toolkit to client toolkit and server toolkit. (Please +1 if you agree or -1 if you disagree with split)

Please answer the following if you give +1 to (1).

2) server toolkit name: (Please pick a or b) (a) com.ibm.streamsx.inetserver (b) com.ibm.streamsx.inet.server

3) Where is the server toolkit going to reside? (Please pick a or b) (a) existing streamsx.inet repository (b) new streamsx.inetserver repository

I think these are the main issues to agree on. If we can agree to this, then we can discuss the logistics like deprecation and migration paths and figure out how to get there.

I hope this helps move things along.

Please vote, so we can see where we are, and work on issues that we cannot agree on.

ddebrunner commented 7 years ago

@joergboe

If the operators and the namespaces are not included in the code, the SPLDOC will not contain references to not existing objects.

Not true. An overview page (e.g. from info.xml) can have manually entered references to items that have been removed. Thus excluding operators may require manually modifying SPLDOC. A well documented toolkit is likely to have such items.

You can see this in inet's info.xml, it has a manually entered list of namespaces:

https://github.com/IBMStreams/streamsx.inet/blob/master/com.ibm.streamsx.inet/info.xml#L14

ddebrunner commented 7 years ago

Actually any page may have a reference to a removed item, it's probably more likely that an overview page will, rather than say an operator referencing another operator in a different namespace. Still possible, e.g. HTTPPost might discuss it could be used with HTTPTupleInjection.

ejpring commented 7 years ago

@chanskw I vote:

  1. +1 to split the operators&functions into separate client-side and server-side toolkits,

  2. for naming the server-side toolkit 'com.ibm.streamsx.inet.server', and

  3. for storing all of the toolkits in the existing 'streamsx.inet' repository.

ejpring commented 7 years ago

@ddebrunner Fair enough. I think the overview documentation included with the product will have to explain that the product contains a subset of what's available from GitHub, and should link to the GitHub repository.

Similarly, if the documentation for client-side operators, functions, and samples includes references to server-side stuff, it should link to the GitHub repository, too.

joergboe commented 7 years ago

Then info.xml is an general overview and has to be replaced along with the version number but this is a triviality.