Open deemaalomair1 opened 5 years ago
Found the solution its two things
First the on_data function in the Tweet listener is missing a new line delimeter so not everything coming as one string I think. add "\n" and it'll work fine if you pprint lines in your notebook.
Second is the "desc" when the dataframe is sorted. This doesnt work. Remove it and that solves the problem. PS it throws an exception if you try and do it in pyspark shell but somehow doesnt break in streaming
def on_data(self, data):
try:
msg = json.loads( data ) # Create a message from json file
print( msg['text'].encode('utf-8') ) # Print the message and UTF-8 coding will eliminate emojis
## self.client_socket.send( msg['text'].encode('utf-8') ) this line is wrong , add the "\n"
self.client_socket.send((str(msg['text']) + "\n").encode('utf-8'))
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
( lines.flatMap( lambda text: text.split( " " ) )
.filter( lambda word: word.lower().startswith("#") )
.map( lambda word: ( word.lower(), 1 ) )
.reduceByKey( lambda a, b: a + b )
.map( lambda rec: Tweet( rec[0], rec[1] ) )
##.foreachRDD( lambda rdd: rdd.toDF().sort( desc("count") # the desc here doesnt work with
.foreachRDD( lambda rdd: rdd.toDF().sort("count")
.limit(10).registerTempTable("tweets") ) )
The issue could be :
Solution: while opening your jupyter follow the below command.(ensure your pyspark2 by default opens in jupyter)
[cloudera@quickstart ~]$ pyspark2 --master local[2]
This should solve your problem.
Was the problem solved? I seem to have the same problem.
Was the problem solved? I seem to have the same problem.
Hello, have you solved this problem? I have the same one.
hello
i did exactly as the steps show but when i run `
i got this error: `
call(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:
` any idea about that?