Open lsiwh37249 opened 3 weeks ago
강조할 점 1
강조할 점 2
.
└── api
└── v1
└── product
├── _SUCCESS
├── specific_path=16
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=17
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=22
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=23
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=24
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=28
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=3
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=4
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
├── specific_path=50
│ └── part-00000-522282c4-cfb9-480f-83c6-e745d35cd7c5.c000.snappy.parquet
save_dir = f"/home/ubuntu/data/tmp/{ds_nodash}/{common_part}" # 저장할 디렉토리 경로
kafka-clients-3.5.2.jar spark-sql-kafka-0-10_2.12-3.5.1.jar spark-sql-kafka-0-10_2.12-3.5.2.jar spark-streaming-kafka-0-10_2.12-3.3.4.jar
Airflow + Spark 코드
spark 관련 코드 https://github.com/encore-PR4/chat-etl/blob/main/src/chat_etl/spark_stream.py
airflow 관련 코드 https://github.com/encore-PR4/dags/blob/main/preprocessing.py
streamlit 관련 코드 https://github.com/encore-PR4/chat-etl/blob/main/src/chat_etl/sl/ad_page.py