EarthScope / rover

ROVER: robust data access tool for FDSN data centers
https://earthscope.github.io/rover/
Other
10 stars 1 forks source link

[Daemon Mode] Subscriptions do not reprocess request following "rover.config:recheck-period" #42

Closed timronan closed 5 years ago

timronan commented 6 years ago

The Rover daemon mode does not seem to to reprocess request based on the "rover.config:recheck-period" .

The Rover subscribe mode seems like it should be very similar to nohup rover retrieve request.txt &, but it does not work like this. It seems like the daemon mode needs to be more robust so it execute in the background and "regularly compare available data with the repository".

After running a test for since 2018-10-09 the subscription request only processed one time. I suspect that "checked" is the last time the request was processed.

rover list-subscribe

  1 created 2018-10-09  checked 2018-10-09T22:57:36
    /Users/tronan/subscribe_rover/subscriptions/rover_subscribe_9d09b6_45379
    http://service.iris.edu/irisws/availability/1/query
    http://service.iris.edu/fdsnws/dataselect/1/query

  1 subscription

The configuration file for this test includes.

 # time between availabilty checks in hours 
 recheck-period=12 
timronan commented 5 years ago

This is addressed by the rover.config:recheck-period and by rover trigger make the command run manually.

chad-earthscope commented 5 years ago

@timronan I cannot duplicate this problem. We need to look closer at your test setup.

I set up a rover subscribe with this request file:

IU COLA 00 LHZ 2018-10-10 2019-1-1

I used the default config (for version 0.0.6) and only changed:

I ran a subscribe with these conditions on both macOS and linux.

I got an email once and hour for the checks. After yesterday's data was archived and added to the availability service both instances retrieved it. In short, it looks like it's working as expected.

timronan commented 5 years ago

I ran another subscription request through the weekend and it worked as expected.

TA * * BH? 2018-08-01 2025-03-11
tronan:subscribe_rover tronan$ rover list-subscribe

  1 created 2018-10-12  checked 2018-10-15T15:45:46
    /Users/tronan/subscribe_rover/subscriptions/rover_subscribe_9d09b6_13229
    http://service.iris.edu/irisws/availability/1/query
    http://service.iris.edu/fdsnws/dataselect/1/query

  1 subscription

tronan:subscribe_rover tronan$ date +%y-%m-%dT%H:%M:%D
18-10-15T09:17:10/15/18

During my original subscription I ran out of memory, and it seems like this lead to the error described.

P* * * * 2017-12-01T00:00:00.000000 2018-01-01T00:01:00.000000

I verified that I did not accidentally run the command rover stop while the subscription was running by checking the log files. A stop.log was never created.

Rover seems to have died after this string of error message.

 DEFAULT  2018-10-09 19:41:13,543: Downloading TA_J16K 2018-109 (N_S 102/231; day 109/281)
 ERROR    2018-10-09 19:41:15,865: Download subscription 1 failed (return code 1)
 ERROR    2018-10-09 19:41:15,910: Download subscription 1 failed (return code 1)
 ERROR    2018-10-09 19:41:15,951: Download subscription 1 failed (return code 1)
 ERROR    2018-10-09 19:41:15,996: Download subscription 1 failed (return code 1)
 CRITICAL 2018-10-09 19:41:16,166: cannot commit - no transaction is active
 DEFAULT  2018-10-09 19:41:16,171: See "rover help daemon"

I will run the test one more time but this issue seems to be closed.

timronan commented 5 years ago

I replicated this error by filling up my disc space on a subscription.

 DEFAULT  2018-10-15 15:42:44,387: Downloading PB_B010 2017-006 (N_S 10/280; day 372/1018)
 ERROR    2018-10-15 15:42:45,444: Download subscription 1 failed (return code 1)
 DEFAULT  2018-10-15 15:42:45,668: Sending completion email to tronan@iris.washington.edu (subject Rover Failure)
 CRITICAL 2018-10-15 15:42:45,746: cannot commit - no transaction is active
 DEFAULT  2018-10-15 15:42:45,753: See "rover help daemon"

An email notifies the user when there is no disc space. The previously had not set up the email capabilities.

The rover daemon (PID 61910) task on Tims-4.local has failed with the error:

  database or disk is full
  (OperationalError)
andrewcooke-isti commented 5 years ago

heh. that's neater than i would have expected tbh.