MetPX / sarrac

C implementation of (a subset of) Sarracenia (large scale file transfer utility)
GNU General Public License v2.0
4 stars 1 forks source link

cpump publishing with "pubTime":"", when receiving a message with a pubTime with a T in it. #143

Closed petersilva closed 1 month ago

petersilva commented 9 months ago

OK... so turning debug on, then found this:


2024-02-17 23:02:15,276 [DEBUG] successfully parsed message body: { "pubTime" : "20240218T040210.398046494", "baseUrl" : "https://hpfx.collab.science.gc.ca", "relPath" : "/20240218/WXO-DD/model_ohps/slfe/100m/00/005/20240218T00Z_MSC_OHPS-SLFE_RiverVelocityY_DBS-Avg_PS100m_PT005H.nc", "source" : "MSC-SCI-CMC-OPS", "size" : "116690126", "atime" : "20240218T040153.0711607933", "mtime" : "20240218T040153.211161852", "mode" : "644", "from_cluster" : "DDSR.CMC", "to_clusters" : "DDSR.SCIENCE,DDSR.CMC,DDI.CMC,DDI.SCIENCE,DDSR.STAGE" }
2024-02-17 23:02:15,276 [INFO] received: { "pubTime":"", "baseUrl":"https://hpfx.collab.science.gc.ca", "relPath":"/20240218/WXO-DD/model_ohps/slfe/100m/00/005/20240218T00Z_MSC_OHPS-SLFE_RiverVelocityY_DBS-Avg_PS100m_PT005H.nc", "topic":"v03.post.20240218.WXO-DD.model_ohps.slfe.100m.00.005", "identity":{  "method" : "md5", "value" : "mcomIbaMClukYsDy4QYsew=="  } , "mtime":"20240218040153.211161852", "atime":"20240218040153.0711607933", "mode":"0644", "size":"116690126", "to_clusters":"DDSR.SCIENCE,DDSR.CMC,DDI.CMC,DDI.SCIENCE,DDSR.STAGE", "from_cluster":"DDSR.CMC"}

so if the pubTime has a "T" in it, the C code fails to decode it properly.

petersilva commented 9 months ago

no... that's not it... there is some decoding problem somewhere, but not clear what. EDIT: It isn't just having a T... there is something more subtle going on because it is intermittent... occurs rarely with messages, where other vanishingly similar messages are correctly processed.

It seems to be that when C implementation ingests v2 message, and then sends it out as v3, and then a second one gets the v3 and sends it as v2... something like that... but only sometimes... most messages are correct.

petersilva commented 2 months ago

I think I found it... some data sources produce timestamps with 9 digits after the decimal point (claiming nanosecond precision... I call b.s. but that's not the issue here.) anyways. the space allocated for time stamps was 25 chars... and these were 25 chars. I increased to 64. and the problem goes away.