dinussen27 / AI_Workshop-RAG

0 stars 0 forks source link

Bug in Data Preprocessing Script #3

Open lava0812 opened 1 month ago

lava0812 commented 1 month ago

Description: The data preprocessing script is critical for preparing input data for further analysis or modeling. However, it sometimes crashes when processing large datasets, leading to memory errors. This issue hampers the ability to handle big data, a common requirement in AI projects.

Detailed Analysis:

Memory Management: The current implementation might load entire datasets into memory, leading to excessive memory consumption and crashes. Efficiency: The processing time could be extended due to inefficient data handling techniques. Proposed Solution:

Batch Processing: Modify the script to process data in chunks or batches, reducing the memory footprint. Streaming Data: Implement a streaming approach to handle large datasets incrementally. Profile Memory Usage: Use tools to identify memory hotspots and optimize those areas. Optimize Data Structures: Consider using more efficient data structures and libraries that are optimized for large datasets (e.g., Pandas, NumPy).

dinussen27 commented 1 month ago

@ivholmlu