dibbhatt / kafka-spark-consumer

High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.
Apache License 2.0
635 stars 318 forks source link

Licensing concerns with current kafka-spark-consumer source code #2

Closed miguno closed 10 years ago

miguno commented 10 years ago

Dibyendu,

first, thanks for your work on providing an improved Kafka consumer for Spark Streaming. Much appreciated!

I have been playing around with Kafka and Spark Streaming myself, and stumbled upon your project in the spark-user thread where you announced it last month. Since there are apparently still a couple of issues (including Spark issues) to be ironed out, I began reading your source code for further details on the current status of Kafka support in Spark Streaming -- actually because I thought that "Hey, the Apache Storm project has a reasonable Kafka spout/connector, maybe that code would help the Spark project to improve their own variant."

While reading your source code that I noticed that apparently most of the code is a verbatim copy of the Kafka spout of the Apache Storm project, which was originally created by wurstmeister. In both cases the code is licensed under the Apache License v2.0, which means you can't just copy the code -- there are some rules you must follow. (And both Apache Spark and Apache Storm, as ASF projects, are using the very same license, which also means it's easy to share code amongst the projects.) Notably, "you must give any other recipients of derivative work a copy of that license, you must cause any modified files to carry prominent notices stating that you changed the files, and you must retain, in the source form of any derivative works that you distribute, all copyright, patent, trademark, and attribution notices from the source form of the work, excluding those notices that do not pertain to any part of the derivative works". See Apache License v2.0 for details of what you would have to do/change/add/etc. to be license compliant.

I am sure you have done this in good faith, and I am making you aware of this issue primarily to help you.

Best wishes, Michael

dibbhatt commented 10 years ago

Hi Michael,

Thanks a lot for your email. I have been following your blogs and post very regularly on Stom and Kafka and those are excellent learning for me and this is great to see your email and your help to guide on this License issue.

Yes, you are correct, the original code is similar to Kafka Storm Spout. In the original Storm Spout Code has two part, first is Fault Tolerant Kafka Connectors ( The ZKCoordinator , DynamicBroakersReader, DynamicPartitionConnection etc) which are almost similar logic in my Spark Connector (with few minor modification to detect Spark Driver/Executor failures and Replay logic) ... But there are good amount modifications done in The Storm part of the code ( Specially there is no KafkaSoout) , where I used KafkaConsumer, the PartitionManager logic has changed to fit to Spark..etc. There is no Storm specific ack related code, no metrics are kept etc.

Can you please guide me for this cases, how do I proceed ? I am new to this process sorry about this.

I can see following point for redistribution...

a. You must give any other recipients of the Work or Derivative Works a copy of this License; and

If I include Apache V2 license , will this point be covered ?

b. You must cause any modified files to carry prominent notices stating that You changed the files; and

Shall I mention in the .Java / Readme files that this is modified from Storm Kafka Spout ? Will that help ?

C. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

What I need to do here ?

D. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

This work does not include any NOTICE text. So Do I need to do anything here ?

Regards, Dibyendu

On Friday, 26 September 2014 2:08 PM, Michael G. Noll notifications@github.com wrote:

Dibyendu, first, thanks for your work on providing an improved Kafka consumer for Spark Streaming. Much appreciated! I have been playing around with Kafka and Spark Streaming myself, and stumbled upon your project in the spark-user thread where you announced it last month. Since there are apparently still a couple of issues (including Spark issues) to be ironed out, I began reading your source code for further details on the current status of Kafka support in Spark Streaming -- actually because I thought that "Hey, the Apache Storm project has a reasonable Kafka spout/connector, maybe that code would help the Spark project to improve their own variant." While reading your source code that I noticed that apparently most of the code is a verbatim copy of the Kafka spout of the Apache Storm project, which was originally created by wurstmeister. In both cases the code is licensed under the Apache License v2.0, which means you can't just copy the code -- there are some rules you must follow. (And both Apache Spark and Apache Storm, as ASF projects, are using the very same license, which also means it's easy to share code amongst the projects.) Notably, "you must give any other recipients of derivative work a copy of that license, you must cause any modified files to carry prominent notices stating that you changed the files, and you must retain, in the source form of any derivative works that you distribute, all copyright, patent, trad emark, a nd attribution notices from the source form of the work, excluding those notices that do not pertain to any part of the derivative works". See Apache License v2.0 for details of what you would have to do/change/add/etc. to be license compliant. I am sure you have done this in good faith, and I am making you aware of this issue primarily to help you. Best wishes, Michael — Reply to this email directly or view it on GitHub.

dibbhatt commented 10 years ago

Hi Mike, I have added the License File and also included comments in every Java file that code has been taken from Storm Kafka Spout and Modified for Spark Streaming. Let me know if this is fine now. Do I need to add anything in LICENSE file for copyright section ?

Dibyendu

miguno commented 10 years ago

(Disclaimer: I'm not a licensing expert either.)

Again, I'm not an expert either. :-)

Best, Michael

PS: I noticed that the commit in which you added the license headers also includes functional changes, see this example.

- ssc.checkpoint(checkpointDirectory);
+ // ssc.checkpoint(checkpointDirectory);

Was this intentional? There were a couple of such functional changes, which seem to have been conflated with the licensing changes.

miguno commented 10 years ago

Do I need to add anything in LICENSE file for copyright section ?

This is up to you.

See section 4 in the ALv2:

You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

dibbhatt commented 10 years ago

Thanks again. I added a NOTICE file, and modified the newly added license header to what you mentioned. And some functional changes which gone in with License changes are okay. I have not added any copyright section in LICENSE file.

dibbhatt commented 10 years ago

Hi Michael,

Let me know if I can close this Licensing issue if everything looks okay ?

Regards, Dibyendu

miguno commented 10 years ago

I think you can close it, Dibyendu.

--Michael

On 27.09.2014, at 10:08, Dibyendu Bhattacharya notifications@github.com wrote:

Hi Michael,

Let me know if I can close this Licensing issue if everything looks okay ?

Regards, Dibyendu

— Reply to this email directly or view it on GitHub.

dibbhatt commented 10 years ago

Thanks Michael.