IBM / kafka-streaming-click-analysis

Use Kafka and Apache Spark streaming to perform click stream analytics
https://developer.ibm.com/patterns/determine-trending-topics-with-clickstream-analysis/
Apache License 2.0
76 stars 56 forks source link

Notebook issues #15

Closed rhagarty closed 6 years ago

rhagarty commented 6 years ago
xwu0226 commented 6 years ago

@ScrapCodes Any update on this? Thanks!

ScrapCodes commented 6 years ago

This issue seems to be transient, sometimes appears and then auto disappears.

ScrapCodes commented 6 years ago

@rhagarty Do you have an update, do you still see this?

rhagarty commented 6 years ago

@ScrapCodes did not see the disappearing cells problem, but still unable to stream data succesfully. Getting this error - [2017-12-10 23:31:14,405] ERROR Error when sending message to topic clicks with key: null, value: 40 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.

ScrapCodes commented 6 years ago

This has something to do with Message Hub, I do get these errors at times, but messages get pushed. Fixing these problems requires access of the upstream kafka service(i.e. Message hub), it is unclear why you are getting timeouts for a correct address and setup. And if you are getting them, I am sure many others will do. Do you have a way forward?

stevemar commented 6 years ago

@ScrapCodes I think we need to identify the root cause of the errors or remove the messagehub components. We can't knowingly publish, ask advocates to promote, and ask external users to use this content if we're running into errors ourselves. We focus on consistency and quality for our IBM Code assets, this one is no exception. So, I'll ask @xwu0226 and yourself at this point to work out a plan for what to do next. I mention a few options above but i'm open to suggestions.

xwu0226 commented 6 years ago

@ScrapCodes I see only one existing pattern that uses MessageHub with Kafka, https://github.com/IBM/openwhisk-data-processing-message-hub

Can you see if you can find any reference or similarity here?

rhagarty commented 6 years ago

@ScrapCodes - it works for me now, but we need to add a couple instructions... 1) since the 'tail' command requires the clickstream tsv file be in '/data', we need to add the copying of the file to the list of commands. 2) I only got it to work by using the '/config/messagehub.properties' file, so we should probably add that a s a necessary step, instead of optional. 3) when importing the credentials into the notebook, the password value has 2 extra set of double quotes around it. This doesn't effect the running of the notebook, but when you copy this value to the messagehub.properties file, these must be removed. We either need to fix the import, or explicitly state to remove them when updating the properties file.

ScrapCodes commented 6 years ago

Thank you for the comments, tail command does not have such a requirement, data can exist anywhere, and they just need to provide the correct path.

Rest of the comments, I will address in my pull request.

ScrapCodes commented 6 years ago

@rhagarty : Oops, I accidentally directly committed to master. But now, that it is done, feel free to send your correction(if any).

rhagarty commented 6 years ago

@ScrapCodes - looks good