Closed petersilva closed 2 years ago
as a sanity check, verify the sender delivered all the files the poll should have seen:
fractal% grep 'after_post posted' sender_tsource2send_f50_01.log | sed "s/.*'new_relPath': '//" | sed "s/'.*//" | sort >sender_f50_posted_sorted.list
fractal% wc -l sender_f50_posted_sorted.list
1111 sender_f50_posted_sorted.list
the combine the two polls to get a number:
sort poll_f62_posted_sorted.list poll_f63_posted_sorted.list >poll_f6x_posted_sorted.list
fractal% wc -l poll_f6x_posted_sorted.list
733 poll_f6x_posted_sorted.list
so it validates that nobody posted close to 400 messages. which ones?
fractal% comm -3 poll_f6x_posted_sorted.list sender_f50_posted_sorted.list | wc -l
378
confirming the number of differences...
fractal% comm -3 poll_f6x_posted_sorted.list sender_f50_posted_sorted.list | head
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050200__PDO_36334
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050200__PFF_64558
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__aos_64675
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__AWR_24738
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__NES_28588
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PAF_19691
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PAH_55647
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PAV_41575
/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PAZ_39910
fractal%
the first ten files that were sent but not posted.
fractal% grep -l CACN00_CWAO_042000__WXL_07612 poll*.log
poll_sftp_f62_01.log
fractal%
the first file only shows up in f62, so look at that log:
fractal% grep -n CACN00_CWAO_042000__WXL_07612 poll*.log | head
poll_sftp_f62_01.log:9858:2022-10-08 12:03:12,582 [DEBUG] sarracenia.flowcb.poll poll_file_post desc: type: <class 'paramiko.sftp_attr.SFTPAttributes'>, value: -rw-rw-r-- 1 1000 1000 269 11 Apr 21:30 CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:10918:2022-10-08 12:03:12,762 [DEBUG] sarracenia.flowcb.nodupe check_message NoDupe calling check( cod,20220412T013017, home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612 )
poll_sftp_f62_01.log:10919:2022-10-08 12:03:12,762 [DEBUG] sarracenia.flowcb.nodupe in_cache added relpath=home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:12045:2022-10-08 12:03:12,794 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 0.21 ) sftp://peter@localhost/ home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:12426:2022-10-08 12:03:12,806 [INFO] sarracenia.flowcb.log after_work noticed ok: /home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:13815:2022-10-08 12:04:29,204 [DEBUG] sarracenia.flowcb.poll poll_file_post desc: type: <class 'paramiko.sftp_attr.SFTPAttributes'>, value: -rw-rw-r-- 1 1000 1000 269 11 Apr 21:30 CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:16222:2022-10-08 12:04:29,811 [DEBUG] sarracenia.flowcb.nodupe check_message NoDupe calling check( cod,20220412T013017, home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612 )
poll_sftp_f62_01.log:17215:2022-10-08 12:04:29,855 [INFO] sarracenia.flowcb.log after_accept home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612 rejected: 304 Not modified 1 (cache check)
poll_sftp_f62_01.log:20751:2022-10-08 12:04:39,326 [DEBUG] sarracenia.flowcb.poll poll_file_post desc: type: <class 'paramiko.sftp_attr.SFTPAttributes'>, value: -rw-rw-r-- 1 1000 1000 269 11 Apr 21:30 CACN00_CWAO_042000__WXL_07612
poll_sftp_f62_01.log:23062:2022-10-08 12:04:39,962 [DEBUG] sarracenia.flowcb.nodupe check_message NoDupe calling check( cod,20220412T013017, home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_042000__WXL_07612 )
fractal%
It is added to the nodupe cache, but it does not get posted... why? ok it should be polled, then go through nodupe , and the after_accept log, then it should be noticed (which means a post should be generated.) ... there is no post message after that... which is on line 12426.... go in the log there, and start scanning forward...
022-10-08 12:03:12,812 [INFO] sarracenia.flowcb.log after_work noticed ok: /home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/UA/CWAO/03/UANT01_CWAO_050301___28133
2022-10-08 12:03:12,812 [INFO] sarracenia.flowcb.log after_work noticed ok: /home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/UA/CWAO/03/UANT01_CWAO_050300___51397
2022-10-08 12:03:12,812 [INFO] sarracenia.flowcb.log after_work noticed ok: /home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/UA/CWAO/03/UANT01_CWAO_050300___42877
2022-10-08 12:03:12,812 [INFO] sarracenia.flowcb.log after_work noticed ok: /home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/WS/NZKL/02/WSNZ21_NZKL_050259___30907
2022-10-08 12:03:12,813 [WARNING] sarracenia.moth.amqp putNewMessage failed xs_tsource_poll: [Errno 104] Connection reset by peer
2022-10-08 12:03:12,813 [DEBUG] sarracenia.moth.amqp putNewMessage Exception details:
Traceback (most recent call last):
File "/home/peter/Sarracenia/sr3/sarracenia/moth/amqp.py", line 541, in putNewMessage
self.channel.basic_publish(AMQP_Message, exchange, topic)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/channel.py", line 1791, in _basic_publish
self.connection.drain_events(timeout=0)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/home/peter/.local/lib/python3.10/site-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/method_framing.py", line 53, in on_frame
callback(channel, method_sig, buf, None)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/home/peter/.local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/connection.py", line 666, in _on_close
self._x_close_ok()
File "/home/peter/.local/lib/python3.10/site-packages/amqp/connection.py", line 681, in _x_close_ok
self.send_method(spec.Connection.CloseOk, callback=self._on_close_ok)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/home/peter/.local/lib/python3.10/site-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/home/peter/.local/lib/python3.10/site-packages/amqp/transport.py", line 379, in write
self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
2022-10-08 12:03:12,813 [DEBUG] amqp collect Closed channel #1
2022-
OK so problem #2 is that files are being noticed, but then the broker dies before their messages get posted. I need to look into what @reidsunderland did about this for post ( #407) .... I would have thought it was the same logic, but we might be hitting it differently.
third problem:
poll sftp_f62 posted 1086 sftp_f63 posted 38
test 20 success: poll sftp_f62+3 (1124) should have the same number of items of sender (1111)
there are extra posts... hmm...
grep 'after_post posted' poll_sftp_f63_01.log | sed "s/.*'path': '//" | sed "s/'.*//" | sort >poll_f63_posted_sorted.list
grep 'after_post posted' poll_sftp_f62_01.log | sed "s/.*'path': '//" | sed "s/'.*//" | sort >poll_f62_posted_sorted.list
fractal% comm -12 poll_f63_posted_sorted.list poll_f62_posted_sorted.list
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PJT_55362
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PTR_16137
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__PXB_05010
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050300__WTN_15938
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WBO_58636
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WDR_55811
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WIZ_25283
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WOJ_59055
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WQC_19364
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WXG_34030
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__WZG_52642
home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN45_CWAO_050300__XHA_10507
fractal%
those are the files that they both posted... now look throught the logs at why... confirming:
fractal% grep 'after_post posted' poll*.log | grep CACN00_CWAO_050100__PFF_45672
poll_sftp_f62_01.log:2022-10-08 20:21:04,194 [INFO] sarracenia.flowcb.log after_post posted {'version': 'v03', 'exchange': ['xs_tsource_poll'], 'local_offset': 0, '_deleteOnPost': {'new_file', 'subtopic', 'post_version', 'noDupe', 'exchange', 'new_subtopic', 'local_offset', 'new_relPath', 'new_dir', 'version', 'new_baseUrl'}, 'pubTime': '20221009T002103.963416338', 'new_dir': '/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03', 'new_file': 'CACN00_CWAO_050100__PFF_45672', 'post_version': 'v03', 'new_baseUrl': 'sftp://peter@localhost/', 'new_relPath': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672', 'new_subtopic': ['home', 'peter', 'sarra_devdocroot', 'sent_by_tsource2send', '20200105', 'WXO-DD', 'bulletins', 'alphanumeric', '20200105', 'CA', 'CWAO', '03'], 'relPath': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672', 'subtopic': ['home', 'peter', 'sarra_devdocroot', 'sent_by_tsource2send', '20200105', 'WXO-DD', 'bulletins', 'alphanumeric', '20200105', 'CA', 'CWAO', '03'], 'baseUrl': 'sftp://peter@localhost/', 'integrity': {'method': 'cod', 'value': 'sha512'}, 'size': 376, 'mtime': '20220412T013017', 'atime': '20221009T002103', 'mode': '664', 'noDupe': {'key': 'cod,20220412T013017', 'path': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672'}}
poll_sftp_f63_01.log:2022-10-08 20:22:25,811 [INFO] sarracenia.flowcb.log after_post posted {'version': 'v03', 'exchange': ['xs_tsource_poll'], 'local_offset': 0, '_deleteOnPost': {'local_offset', 'new_relPath', 'new_dir', 'version', 'post_version', 'subtopic', 'new_file', 'noDupe', 'new_baseUrl', 'new_subtopic', 'exchange'}, 'pubTime': '20221009T002225.662383556', 'new_dir': '/home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03', 'new_file': 'CACN00_CWAO_050100__PFF_45672', 'post_version': 'v03', 'new_baseUrl': 'sftp://peter@localhost/', 'new_relPath': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672', 'new_subtopic': ['home', 'peter', 'sarra_devdocroot', 'sent_by_tsource2send', '20200105', 'WXO-DD', 'bulletins', 'alphanumeric', '20200105', 'CA', 'CWAO', '03'], 'relPath': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672', 'subtopic': ['home', 'peter', 'sarra_devdocroot', 'sent_by_tsource2send', '20200105', 'WXO-DD', 'bulletins', 'alphanumeric', '20200105', 'CA', 'CWAO', '03'], 'baseUrl': 'sftp://peter@localhost/', 'integrity': {'method': 'cod', 'value': 'sha512'}, 'size': 376, 'mtime': '20220412T013017', 'atime': '20221009T002103', 'mode': '664', 'noDupe': {'key': 'cod,20220412T013017', 'path': 'home/peter/sarra_devdocroot/sent_by_tsource2send/20200105/WXO-DD/bulletins/alphanumeric/20200105/CA/CWAO/03/CACN00_CWAO_050100__PFF_45672'}}
fractal%
have a look at the logs for a bit...
I think it is good now, after the last commit. I get exact matches, no missing polls, no extra ones either.
ci/cd clean now also... so all good.
as part of working on https://github.com/MetPX/sr_insects/issues/16 In the flakey_broker test, we simulate the failure of two poll configurations that differ by their vip setting. when the broker is shut down as part of the flakey_broker test, the vip settings of the two polls are exchanged, so that the active and passive roles of the two configurations are switched. This role switch is repeated for each of the the broker shutdowns.
It has become clear that poll isn't dealing with broker failures quite right....
this is the initial problem statement from that issue:
part1: noticed redundant posting (both participants in the poll posting):
went into the log directory (~/.cache/sr3/log) after a run, and the above snippet prints the paths of files posted by each polling configuration. Since the list is not zero... there is some redundant posting going on.