Open TheSpaceCuber opened 2 years ago
Tested pyspark setup on EMR. SSH into master node, followed by running spark-submit main.py
.
Successful output as parquet file. Next to test Athena on parquet files.
For java, issue with reading files from S3, will test out next time if we are using java.
In defaults.ini for Grafana, change to the following
allow_embedding: true cookie_samesite: none
[auth.anonymous] enabled = true
hide_version = false
Test EMR with S3 & Athena with S3
Setup proper flow of data first with a working example
Look into how to avoid re-reading old raw data