MetPX / sarracenia

https://MetPX.github.io/sarracenia
GNU General Public License v2.0
45 stars 22 forks source link

v3 sarra_get_cis_rcm crashes after watch_ice path or post_baseDir change #1216

Open robjarawan opened 1 month ago

robjarawan commented 1 month ago

I changed the watch config to use the user sarra and point to its homedir in order to test for issues; during that time it seems these whatver i did was causing the sarra to crash - 7+ times, see acdc 22877

watch settings i used ( i think I changed post_baseDir a couple times to see what it would do the messages) but i was playing around with the slashes because i noticed the sarra was showing chdir local/home... without an additional slash in front (relative?) but i did not know it would eventually crash before leaving it until next day and causing some pager ruckus

post_baseUrl sftp://sarra@${HOSTNAME}
post_baseDir /local/home/sarra/ice/
path /local/home/sarra/ice/rcm/

[ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file

Log dump (/local/home/sarra/.cache/sr3/log/sarra_get_cis_rcm_01.log):
raise TimeoutException("signal alarm timed out")
sarracenia.transfer.TimeoutException: signal alarm timed out
2024-09-10 01:55:07,468 [INFO] sarracenia.flow metricsFlowReset looking for old metrics for /local/home/sarra/.cache/sr3/metrics/sarra_get_cis_rcm_01.json
2024-09-10 01:55:07,485 [INFO] sarracenia.moth.amqp putSetup exchange declared: xpublic (as: amqp://feeder@localhost/)
2024-09-10 01:55:07,508 [INFO] sarracenia.moth.amqp _queueDeclare queue declared q_feeder.sarra.get_cis_rcm.ddsr-shared (as: amqp://feeder@ddsr.cmc.ec.gc.ca/), (messages waiting: 0)
2024-09-10 01:55:07,508 [INFO] sarracenia.moth.amqp getSetup binding q_feeder.sarra.get_cis_rcm.ddsr-shared with v02.post.# to xs_MSC-ICE (as: amqp://feeder@ddsr.cmc.ec.gc.ca/)
2024-09-10 02:00:10,496 [INFO] sarracenia.flow _runHousekeeping on_housekeeping pid: 38385 sarra/get_cis_rcm instance: 1
2024-09-10 02:00:10,496 [INFO] sarracenia.flowcb.gather.message on_housekeeping messages: good: 0 bad: 0 bytes: 0 Bytes average: 0 Bytes
2024-09-10 02:00:10,497 [INFO] sarracenia.diskqueue on_housekeeping work_retry_01 Number of messages in retry list 1
2024-09-10 02:00:10,498 [INFO] sarracenia.flowcb.housekeeping.resources on_housekeeping Current cpu_times: user=0.64 system=0.04
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.housekeeping.resources on_housekeeping Current mem usage: 136.1 MiB, accumulating count (0 or 0/100 so far) before self-setting threshold
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats version: 3.00.54p1, started: 5 minutes ago, last_housekeeping: 303.0 seconds ago 
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats messages received: 0, accepted: 0, rejected: 0 rate accepted: 0.0% or 0.0 m/s
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats files transferred: 0 bytes: 0 Bytes rate: 0 Bytes/sec
2024-09-10 02:00:10,499 [INFO] sarracenia.flow metricsFlowReset looking for old metrics for /local/home/sarra/.cache/sr3/metrics/sarra_get_cis_rcm_01.json
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 2360.08 ) sftp://sarra@ddsr-cmc-ops01.cmc.ec.gc.ca /local/home/sarra/ice/rcm/RCM_test.zip
2024-09-10 02:00:11,951 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,951 [INFO] sarracenia.flow do_download attempt 1 failed to download sftp://sarra@ddsr-cmc-ops01.cmc.ec.gc.ca/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,951 [WARNING] sarracenia.flow do_download downloading again, attempt 2
2024-09-10 02:00:11,952 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,952 [INFO] sarracenia.flow do_download attempt 2 failed to download sftp://sarra@ddsr-cmc-ops01.cmc.ec.gc.ca/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,952 [WARNING] sarracenia.flow do_download downloading again, attempt 3
2024-09-10 02:00:11,954 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,954 [INFO] sarracenia.flow do_download attempt 3 failed to download sftp://sarra@ddsr-cmc-ops01.cmc.ec.gc.ca/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,954 [ERROR] sarracenia.flow do_download gave up downloading for now, appending to retry queue
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/sarracenia/instance.py", line 249, in 
i.start()
File "/usr/lib/python3/dist-packages/sarracenia/instance.py", line 240, in start
self.running_instance.run()
File "/usr/lib/python3/dist-packages/sarracenia/flow/__init__.py", line 672, in run
time.sleep(increment)
File "/usr/lib/python3/dist-packages/sarracenia/transfer/__init__.py", line 62, in alarm_raise
raise TimeoutException("signal alarm timed out")
sarracenia.transfer.TimeoutException: signal alarm timed out
petersilva commented 1 month ago

That's this: https://github.com/MetPX/sarracenia/pull/1208 It should be already fixed on dev.

petersilva commented 1 month ago

need a / at the end of post_baseUrl... then the chdir will be /local/home... and should succeed.

petersilva commented 1 month ago

fixed release 3.0.55 (and all release candidates)

petersilva commented 1 month ago

@robjarawan can you try v3.00.55 and see if it fixes it?