amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

"Data exploration using Spark SQL" uses deprecated function `parquetFile()` #200

Closed gostevehoward closed 8 years ago

gostevehoward commented 8 years ago

http://www.cs.berkeley.edu/~jey/ampcamp6/training/data-exploration-using-spark-sql.html says to use

>>> wikiData = sqlCtx.parquetFile("data/wiki_parquet")

which is deprecated, and the suggested replacement seems to work...

>>> wikiData = sqlCtx.parquetFile('data/wiki_parquet')
/home/steve/work/ampcamp6/ampcamp6/spark/python/pyspark/sql/context.py:434: UserWarning: parquetFile is deprecated. Use read.parquet() instead.
  warnings.warn("parquetFile is deprecated. Use read.parquet() instead.")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
>>> wikiData.count()
39365                                                                           
>>> wikiData2 = sqlCtx.read.parquet('data/wiki_parquet')                                                            
>>> wikiData2.count()
39365