aylabs / bigdata-practical-intro

A practical intro to Big Data based on Apache Spark
Apache License 2.0
1 stars 0 forks source link

Add samples on howto create dataframes #1

Closed acs closed 5 years ago

acs commented 5 years ago

Some nice references:

https://github.com/hortonworks/data-tutorials/wiki https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-scala.html

acs commented 5 years ago

A Dataframe is an RDD with an schema, The key is howto manage the schema that defines the RDD contents. Scala case classes is a great way to do it. But there are others, like specifying manually the schema and use it to create the datafrane using the RDD.

acs commented 5 years ago

Already done examples at https://github.com/acs/spark-performance-testing/blob/master/src/main/scala/personal/acs/spark/App.scala#L49

acs commented 5 years ago

Too specific for the intro. Closing.