D-Mielewczyk / euro-temperature-trend-stats

MIT License
0 stars 0 forks source link

Prepare AWS EMR Infrastructure and Execute Ready Code in the Cloud #10

Open D-Mielewczyk opened 3 weeks ago

D-Mielewczyk commented 3 weeks ago

Set up the necessary AWS EMR infrastructure and execute the prepared code in the cloud. This involves creating and configuring the EMR cluster, uploading the necessary scripts and data, and running the PySpark jobs to perform the data analysis and generate the visualizations.

Requirements:

  1. Set Up AWS EMR Cluster:

    • Create an EMR cluster with the required configurations and software (Apache Spark, Hadoop, etc.).
    • Ensure the cluster is appropriately sized to handle the data processing tasks.
  2. Upload Data and Scripts:

    • Upload the cleaned data and prepared PySpark scripts to the S3 bucket.
    • Ensure all necessary dependencies and configurations are in place.
  3. Execute PySpark Jobs:

    • Run the PySpark scripts on the EMR cluster.
    • Monitor the execution to ensure successful completion.
  4. Retrieve and Save Results:

    • Retrieve the results from the EMR cluster.
    • Save the generated visualizations and any other output to the designated S3 bucket.
  5. Document the Process:

    • Provide documentation on the setup and execution process.
    • Include any necessary commands or configurations used.

Details:

Acceptance Criteria:

Additional Notes:

D-Mielewczyk commented 3 weeks ago

You can start setting up the infrastructure, but executing all the queries is blocked until #7 is finished