NOAA-GSL / VxIngest

Other
2 stars 0 forks source link

Refactor job scheduling to better support metrics #150

Closed randytpierce closed 2 years ago

randytpierce commented 2 years ago

Because the current job runner scripts/VXingest_utilities/run-cron.sh is very basic, the scheduled jobs all end up with output in the same log file for a given run. This isn't preferable. Jeff and I decided it would be better to go ahead and create a better job scheduler that does a few things. 1) moves the load_spec data and some of the run parameters into a couchbase document of type 'JOB'. This way an ingest runner could jest retrieve the job spec from wherever it is running. 2) allows for a better scheduler on a per job basis. Currently the run-cron is just a script that cron runs and it simply runs hard coded jobs in order. A scheduler would allow for better synchronization and allows for synchronization across run platforms. 3) makes a more logical separation of what is a run parameter and what is a job spec field. Run parameters should be specific to the run environment. 4) Allows for better control of the log files, and the metrics files so that they are easier handled on different platforms.

This is involved enough that I made a feature branch from development named "refactor_for_job_documents"

randytpierce commented 2 years ago

This will also be documented in the metrics document for review. https://docs.google.com/document/d/1AGOBN5YkgFGKU7MfnS9dJwpJ1eo5sh4XjZamcu8TC6E/edit?usp=sharing

randytpierce commented 2 years ago

This is nearly complete, all the python code is completed but has not yet been merged into the development branch. Once this is deployed we will need to modify the cron script.

randytpierce commented 2 years ago

This is all working now and deployed on 'adb-cb1'. There is a cron job that runs every fifteen minutes and checks to see if the schedule for any jobs shows them as ready. All the current jobs are scheduled to run once an hour.

We should be able to deploy this anywhere. One step remains but it should be a different issue. The import and scraper should be separated from the ingest and currently they run in the same script. It should not be difficult to separate them, and run the import/scraper on a much quicker pace, like every two minutes. refer to issue https://github.com/NOAA-GSL/VxIngest/issues/155

bonnystrong commented 2 years ago

Randy, I have one question. Does this mean it takes as much as an hour for us to receive alerts when there's a problem?

On Fri, Aug 12, 2022 at 4:42 PM randytpierce @.***> wrote:

This is all working now and deployed on 'adb-cb1'. There is a cron job that runs every fifteen minutes and checks to see if the schedule for any jobs shows them as ready. All the current jobs are scheduled to run once an hour.

We should be able to deploy this anywhere. One step remains but it should be a different issue. The import and scraper should be separated from the ingest and currently they run in the same script. It should not be difficult to separate them, and run the import/scraper on a much quicker pace, like every two minutes. refer to issue #155 https://github.com/NOAA-GSL/VxIngest/issues/155

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213570957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6HZOQESHGJ5K63AAY75VLVY3HM5ANCNFSM55TXWZIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

randytpierce commented 2 years ago

No, the metrics and the alerts happen really soon after the import finishes, like maybe a minute or two. Currently we have scheduled the ingest runs to be on an hour interval. We can make them run more often but the data wouldn't be available for a quicker rate. randy

On Fri, Aug 12, 2022 at 4:55 PM bonnystrong @.***> wrote:

Randy, I have one question. Does this mean it takes as much as an hour for us to receive alerts when there's a problem?

On Fri, Aug 12, 2022 at 4:42 PM randytpierce @.***> wrote:

This is all working now and deployed on 'adb-cb1'. There is a cron job that runs every fifteen minutes and checks to see if the schedule for any jobs shows them as ready. All the current jobs are scheduled to run once an hour.

We should be able to deploy this anywhere. One step remains but it should be a different issue. The import and scraper should be separated from the ingest and currently they run in the same script. It should not be difficult to separate them, and run the import/scraper on a much quicker pace, like every two minutes. refer to issue #155 https://github.com/NOAA-GSL/VxIngest/issues/155

— Reply to this email directly, view it on GitHub <https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213570957 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AG6HZOQESHGJ5K63AAY75VLVY3HM5ANCNFSM55TXWZIA

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213575625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPSADJE3P3GU7JGO5LDVY3I6FANCNFSM55TXWZIA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Randy Pierce

bonnystrong commented 2 years ago

Would it not make more sense to schedule the monitoring crontab to run every 15 mins so that we don't have to remember later how this correlates with ingest scheduling?

On Fri, Aug 12, 2022 at 4:58 PM randytpierce @.***> wrote:

No, the metrics and the alerts happen really soon after the import finishes, like maybe a minute or two. Currently we have scheduled the ingest runs to be on an hour interval. We can make them run more often but the data wouldn't be available for a quicker rate. randy

On Fri, Aug 12, 2022 at 4:55 PM bonnystrong @.***> wrote:

Randy, I have one question. Does this mean it takes as much as an hour for us to receive alerts when there's a problem?

On Fri, Aug 12, 2022 at 4:42 PM randytpierce @.***> wrote:

This is all working now and deployed on 'adb-cb1'. There is a cron job that runs every fifteen minutes and checks to see if the schedule for any jobs shows them as ready. All the current jobs are scheduled to run once an hour.

We should be able to deploy this anywhere. One step remains but it should be a different issue. The import and scraper should be separated from the ingest and currently they run in the same script. It should not be difficult to separate them, and run the import/scraper on a much quicker pace, like every two minutes. refer to issue #155 https://github.com/NOAA-GSL/VxIngest/issues/155

— Reply to this email directly, view it on GitHub < https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213570957 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AG6HZOQESHGJ5K63AAY75VLVY3HM5ANCNFSM55TXWZIA

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

— Reply to this email directly, view it on GitHub <https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213575625 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGDVQPSADJE3P3GU7JGO5LDVY3I6FANCNFSM55TXWZIA

. You are receiving this because you modified the open/close state.Message ID: @.***>

-- Randy Pierce

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213576634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6HZORLESCXGBCFQTXH5GTVY3JJXANCNFSM55TXWZIA . You are receiving this because you commented.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

randytpierce commented 2 years ago

Probably, I made another issue in VxIngest (# 155 I think) to separate the ingest and the import (which we have to do for HPC anyway) and then they would run at different intervals, and the import and scraping interval would be much shorter, like two minutes. I think that would do what you are thinking. There isn't much work in doing that, I just wanted to have it as a separate issue. I sort of figured that it goes along with deploying on the new VM. randy

On Fri, Aug 12, 2022 at 5:07 PM bonnystrong @.***> wrote:

Would it not make more sense to schedule the monitoring crontab to run every 15 mins so that we don't have to remember later how this correlates with ingest scheduling?

On Fri, Aug 12, 2022 at 4:58 PM randytpierce @.***> wrote:

No, the metrics and the alerts happen really soon after the import finishes, like maybe a minute or two. Currently we have scheduled the ingest runs to be on an hour interval. We can make them run more often but the data wouldn't be available for a quicker rate. randy

On Fri, Aug 12, 2022 at 4:55 PM bonnystrong @.***> wrote:

Randy, I have one question. Does this mean it takes as much as an hour for us to receive alerts when there's a problem?

On Fri, Aug 12, 2022 at 4:42 PM randytpierce @.***> wrote:

This is all working now and deployed on 'adb-cb1'. There is a cron job that runs every fifteen minutes and checks to see if the schedule for any jobs shows them as ready. All the current jobs are scheduled to run once an hour.

We should be able to deploy this anywhere. One step remains but it should be a different issue. The import and scraper should be separated from the ingest and currently they run in the same script. It should not be difficult to separate them, and run the import/scraper on a much quicker pace, like every two minutes. refer to issue #155 <https://github.com/NOAA-GSL/VxIngest/issues/155

— Reply to this email directly, view it on GitHub < https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213570957 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AG6HZOQESHGJ5K63AAY75VLVY3HM5ANCNFSM55TXWZIA

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

— Reply to this email directly, view it on GitHub < https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213575625 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AGDVQPSADJE3P3GU7JGO5LDVY3I6FANCNFSM55TXWZIA

. You are receiving this because you modified the open/close state.Message ID: @.***>

-- Randy Pierce

— Reply to this email directly, view it on GitHub <https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213576634 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AG6HZORLESCXGBCFQTXH5GTVY3JJXANCNFSM55TXWZIA

. You are receiving this because you commented.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA office: (719) 301-6195 or home: (970) 669-1188

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/150#issuecomment-1213580015, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPWVSHNFWDG3JOE2BIDVY3KLFANCNFSM55TXWZIA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Randy Pierce