causify-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
112 stars 77 forks source link

Spring2024_Streaming_Word_Count_with_Apache_Spark_Streaming #788

Open meghanakolanu opened 7 months ago

meghanakolanu commented 7 months ago

Description: Develop a real-time streaming word count application leveraging Apache Spark Streaming's DStream API. Utilize Python to ingest streaming text data from a chosen source, tokenize the words, and conduct word count aggregation within micro-batch intervals. Enhance the application by exploring additional functionalities such as filtering stop words, implementing windowed operations for time-based analysis, and incorporating visualizations for real-time insights. Execute the Spark Streaming application within a notebook environment for seamless processing and analysis of live streaming data. Enhancements can also include integrating with external systems for data ingestion or storage.

Google Doc Link: https://docs.google.com/document/d/1GEOmfpBUXiCua18wR1Hx1OMUVlku-1of/edit#heading=h.p256pqrk0gm4

meghanakolanu commented 7 months ago

I am getting an error when trying to build docker on terminal. I have attached the screenshots below of the error.

Screenshot 2024-04-25 at 10 16 18 PM Screenshot 2024-04-25 at 10 16 29 PM
meghanakolanu commented 7 months ago

I am having an issue with pulling the request on git as its showing the following message below.

Screenshot 2024-04-26 at 12 39 32 AM