gabrielanyosa1 / ETLNewsletters

Apache License 2.0
0 stars 0 forks source link

Issue 3: Progress Notifications Not Triggered at Thresholds and Delayed Data Persistence #3

Open gabrielanyosa1 opened 2 weeks ago

gabrielanyosa1 commented 2 weeks ago

Description: During the execution, the expected progress notifications were not triggered when thresholds (either 1000 emails processed or every 30 minutes) were reached. Only the final notification was sent. Additionally, data was not persisted to MongoDB or updated in JSON until the end of the process, and memory cleanup did not occur periodically as expected. The lack of a progress bar suggests that task configuration may not be supporting concurrency or timely execution of intermediate steps.

Observed Behaviors: • No progress notifications sent at set thresholds. • JSON updates, MongoDB persistence, and memory cleanup occurred only at the end. • Progress bar did not appear.

Possible Causes:

  1. Single-Threaded Execution: If the program is running in a single-threaded manner while trying to handle multiple tasks (fetching, persisting, notifying), it may queue tasks sequentially, delaying intermediate steps (like JSON and MongoDB updates).
  2. Task Configuration Issue: If the program is improperly configured to send notifications only once at completion or lacks a specific task scheduler for intermediate steps, the thresholds (e.g., 1000 emails or 30 minutes) might not trigger.
  3. Blocking Operations or Resource Contention: If large data processing or network operations (like saving to MongoDB) block the main thread, this can delay or prevent intermediate steps, such as notifications or JSON updates, from executing until completion.
  4. Progress Bar and Logging Conflicts: The absence of a progress bar could indicate that logging and notification steps are not being triggered in a way that aligns with the email processing rate. This could also mean that there’s no check for notification conditions in the main processing loop.
  5. Variable Synchronization Issues: If variables that track email count or time are not updated synchronously with other tasks (e.g., due to timing delays or order of operations), the notifications might fail to reflect accurate counts or elapsed time.
  6. Concurrency and Memory Management: Without parallel processing, all tasks rely on a single queue, meaning that memory management (clearing processed data) might not occur until after all emails are processed.
  7. Code Execution Order: If JSON/MongoDB updates and memory cleanup are placed at the end of the processing loop rather than within it, these steps will only occur once all emails are fetched and processed.

Recommended Changes: • Eliminate Progress Bar: Unless parallel processing or multithreading can be implemented, removing the progress bar might simplify the program and help isolate where tasks are not being triggered. • Optimize Task Scheduler: Consider implementing a task scheduler or callback function that periodically checks for threshold conditions (like email count or elapsed time) and triggers notifications and persistence tasks as soon as they are met. • Separate Persistence and Notification: Refactor the code so that JSON updates, MongoDB persistence, and memory cleanup occur independently from email processing, based on thresholds rather than the end of processing.

gabrielanyosa1 commented 1 week ago

Status Update After Commit 54c1747

Progress Notifications and Display Enhancement Needed

Current Status

After commit 54c1747, several related issues were fixed:

Remaining Issues

  1. Progress Bar Display

    • Progress bar implementation is not functioning
    • No visual feedback during processing
    • Need to investigate tqdm integration
  2. Notification Thresholds

    • Email count threshold (1000 emails) notifications not implemented
    • Need to combine with existing time-based notifications
  3. Data Persistence Timing

    • Need to verify intermediate data persistence
    • Potential optimization of MongoDB update frequency

Action Items

  1. Fix progress bar display

    • Debug tqdm implementation
    • Ensure proper updates during processing
  2. Implement email count threshold notifications

    • Add NOTIFICATION_THRESHOLD checks
    • Combine with time-based notifications
  3. Verify intermediate data persistence

    • Add logging for MongoDB update timing
    • Optimize persistence frequency if needed

Notes