TheSpaceCuber / datalake

CS4225 AY2021/22
0 stars 1 forks source link

AWS Setup (EMR) #4

Open TheSpaceCuber opened 2 years ago

TheSpaceCuber commented 2 years ago

Test EMR with S3 & Athena with S3

Setup proper flow of data first with a working example

Look into how to avoid re-reading old raw data

TheSpaceCuber commented 2 years ago

Tested pyspark setup on EMR. SSH into master node, followed by running spark-submit main.py.

Successful output as parquet file. Next to test Athena on parquet files.

For java, issue with reading files from S3, will test out next time if we are using java.

TheSpaceCuber commented 2 years ago

In defaults.ini for Grafana, change to the following

allow_embedding: true cookie_samesite: none

[auth.anonymous] enabled = true

hide_version = false