emiljdd / Tweepy-SparkTwitterI

Data Practicum II
4 stars 7 forks source link

AnalysisException: 'Table or view not found: tweets; line 1 pos 23' #5

Open deemaalomair1 opened 5 years ago

deemaalomair1 commented 5 years ago

hello

i did exactly as the steps show but when i run `

         count = 0
        while count < 10:
       time.sleep( 3 )
    top_10_tweets = sqlContext.sql( 'Select tag, count from tweets' )
    top_10_df = top_10_tweets.toPandas() 
    display.clear_output(wait=True) #Clears the output, if a plot exists.
    sns.plt.figure( figsize = ( 10, 8 ) )
    sns.barplot( x="count", y="tag", data=top_10_df)
    sns.plt.show()
    count = count + 1`

i got this error: `

   y4JJavaError                             Traceback (most recent call last)
 /usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/sql/utils.py in deco(*a, **kw)
 62         try:
 ---> 63             return f(*a, **kw)
64         except py4j.protocol.Py4JJavaError as e:
 /usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in 
  get_return_value(answer, gateway_client, target_id, name)
     327                     "An error occurred while calling {0}{1}{2}.\n".
  --> 328                     format(target_id, ".", name), value)
    329             else:

   Py4JJavaError: An error occurred while calling o25.sql.
 : org.apache.spark.sql.AnalysisException: Table or view not found: tweets; line 1 pos 23
 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:733)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:685)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:708)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:708)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:654)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
    Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: 
     Table or view 'tweets' 
     not found in database 'default';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:83)
at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:118)
at 
    org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply
    (HiveExternalCatalog.scala:700)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply
     (HiveExternalCatalog.scala:700)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:699)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable
    (ExternalCatalogWithListener.scala:138)
at 
    org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at 
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache
  $spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog
    (Analyzer.scala:730)
   ... 53 more

    During handling of the above exception, another exception occurred:

   AnalysisException                         Traceback (most recent call last)
  <ipython-input-10-e456dbb17c7c> in <module>
  3 
  4         time.sleep( 3 )
  ----> 5         top_10_tweets = sqlContext.sql( 'Select tag, count from tweets' )
  6         top_10_df = top_10_tweets.toPandas()
  7         display.clear_output(wait=True) #Clears the output, if a plot exists.

 /usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/sql/context.py in sql(self, sqlQuery)
 356         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
 357         """
--> 358         return self.sparkSession.sql(sqlQuery)
359 
360     @since(1.0)

/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/sql/session.py in sql(self, sqlQuery)
765         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
766         """
 --> 767         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
768 
769     @since(2.0)

/usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in 

call(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:

/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/sql/utils.py in deco(*a, **kw)
 67                                              e.java_exception.getStackTrace()))
 68             if s.startswith('org.apache.spark.sql.AnalysisException: '):
 ---> 69                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
 70             if s.startswith('org.apache.spark.sql.catalyst.analysis'):
 71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)

   AnalysisException: 'Table or view not found: tweets; line 1 pos 23'

` any idea about that?

CainDelta commented 4 years ago

Found the solution its two things

First the on_data function in the Tweet listener is missing a new line delimeter so not everything coming as one string I think. add "\n" and it'll work fine if you pprint lines in your notebook.

Second is the "desc" when the dataframe is sorted. This doesnt work. Remove it and that solves the problem. PS it throws an exception if you try and do it in pyspark shell but somehow doesnt break in streaming

 def on_data(self, data):
        try:
            msg = json.loads( data ) # Create a message from json file
            print( msg['text'].encode('utf-8') ) # Print the message and UTF-8 coding will eliminate emojis
            ## self.client_socket.send( msg['text'].encode('utf-8') )  this line is wrong , add the "\n" 
            self.client_socket.send((str(msg['text']) + "\n").encode('utf-8'))
            return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True
( lines.flatMap( lambda text: text.split( " " ) ) 
              .filter( lambda word: word.lower().startswith("#") ) 
              .map( lambda word: ( word.lower(), 1 ) ) 
              .reduceByKey( lambda a, b: a + b ) 
              .map( lambda rec: Tweet( rec[0], rec[1] ) ) 
              ##.foreachRDD( lambda rdd: rdd.toDF().sort( desc("count") # the desc here doesnt work with 
              .foreachRDD( lambda rdd: rdd.toDF().sort("count")      
              .limit(10).registerTempTable("tweets") ) ) 
rpalaani30 commented 4 years ago

The issue could be :

  1. your spark default is in yarn, change it local mode. 2.There is only one executor working, which is receiving message, but no executor is processing message.

Solution: while opening your jupyter follow the below command.(ensure your pyspark2 by default opens in jupyter)

[cloudera@quickstart ~]$ pyspark2 --master local[2]

This should solve your problem.

sumukhbhat2701 commented 2 years ago

Was the problem solved? I seem to have the same problem.

duc-haminh commented 2 years ago

Was the problem solved? I seem to have the same problem.

Hello, have you solved this problem? I have the same one.