aws-samples / pyflink-getting-started

MIT No Attribution
53 stars 26 forks source link

Pyflink - The Python Apache Flink Interpreter

🚨 β€œAugust 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink.”


This repository will include code examples and walkthroughs for the following common tasks:

Table of Contents

  1. πŸ’»     Local Development using Pyflink
  2. πŸ“¦     Packaging your Pyflink Application for use with Amazon Managed Service for Apache Flink
  3. πŸš€     Deploying and running your Pyflink Application to Amazon Managed Service for Apache Flink
  4. πŸ“„     Logging in a Pyflink Application, and where to see those logs in Amazon Managed Service for Apache Flink
  5. πŸ”§     Basic Troubleshooting and Monitoring



Thank you to @kremrik for the helpful miniconda instructions below.

Prerequisites

  1. Install Miniconda with Dependencies

    1. Follow the instructions here to download to your machine.

      bash Miniconda3-latest-MacOSX-x86_64.sh

      This is for my case, but verify yours!

    2. Ensure that you prepend miniconda to your PATH, in your .bashrc or elsewhere:

      export PATH=~/miniconda3/bin:$PATH

      Then type:

      source ~/.bashrc
    3. Verify your path has been setup correctly after sourcing your .bashrc by typing:

      which python
      > /home/$USER/miniconda3/bin/python
    4. Once installed, create a virtual environment to use for your flink environment:

      conda create -n my-new-environment pip python=3.8

      This creates a new conda environment with pip installed. The pip at the end of this documentation ensures that when running pip install commands, they are installed to the correct location.

      I've found that python 3.9 > doesn't play nicely with some of the Apache Flink dependencies, so just specify 3.8.

    5. After creating your new environment, activate it by typing:

      conda activate my-new-environment

      Then verify that the correct pip is being used:

      which pip
      > /home/$USER/miniconda3/envs/my-new-environment/bin/pip

      Once this is set up, installing modules like apache-flink is as simple as typing pip install apache-flink, which will install it into your miniconda environment.

      Go ahead and install apache-flink since we'll need it for the rest of this exercise.

      (my-new-environment) $ pip install apache-flink==1.15.2

Additional Note: Please validate that you are either using Java 8 or Java 11 when running examples. There are compatibility issues with later versions of Java due to the Py4j libary calling out to the Kinesis Connector.

(my-new-environment) jdber@147dda1bd4b4 ~ % java -version
openjdk version "11.0.9.1" 2020-11-04 LTS
OpenJDK Runtime Environment Corretto-11.0.9.12.1 (build 11.0.9.1+12-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.9.12.1 (build 11.0.9.1+12-LTS, mixed mode)

Continue on to Getting Started!