Ozxahmed / dec-proj1-chicago-crime

0 stars 0 forks source link

[Feature] Scheduling logging and metadata logging #16

Closed Ozxahmed closed 8 months ago

Ozxahmed commented 8 months ago

This issue is for code to figure out how to schedule our pipeline, along with writing metadata logs to a database table.

mk-estrada commented 8 months ago

Here's the base code for scheduling from the lectures:

import schedule
import time

#Example
def job():
    print("I'm working...")

schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().day.at("12:42", "Europe/Amsterdam").do(job)
schedule.every().minute.at(":17").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

##A schedule is set by doing `schedule.every(X).minute.do(job)`. When the schedule needs to execute, it will run the `job()` function.

##The code above polls every second using `time.sleep(1)` to check if the job needs to be run.
mk-estrada commented 8 months ago

I thought we could do something like schedule the ETL to run every day and check everyday to see records have been updated. I think this will work, but could be a good case for a unit test:

def job1():
    print("I'm working...")

    schedule.every().day.do(job)

while True:
    schedule.run_pending()
    time.sleep(86400)
mk-estrada commented 8 months ago

This works for running every minute, but is very basic. I would like to test it some more with the entire pipeline.

` schedule.every(1).minutes.do(get_max_date_crime_data, APP_TOKEN=APP_TOKEN)

while True: schedule.run_pending() print("I'm working...") time.sleep(60) `

mk-estrada commented 8 months ago

I pushed my mke-pipeline branch with the pipeline scheduling. Since it's not completely working, the changes were implemented in a copy of the api_connection.py file. I'll add more comments to the slack conversation.