RD-Connect / vcfLoader

From gVCFS to ElasticSearch through SparkSQL-Hive-Parquet
2 stars 1 forks source link

statistics #26

Open dpiscia opened 8 years ago

dpiscia commented 8 years ago

I would be nice to get some insight from data ETL, dataframe comes with some built statistics, but mllib provides more.

It would be good to access through a ELK stack

dpiscia commented 8 years ago

dataframe .describe().show() --> https://databricks.com/blog/2015/06/02/statistical-and-mathematical-functions-with-dataframes-in-spark.html