distributed-system-analysis / pbench

A benchmarking and performance analysis framework
http://distributed-system-analysis.github.io/pbench/
GNU General Public License v3.0
188 stars 108 forks source link

Avoid Sync message overflow #3389

Closed dbutenhof closed 1 year ago

dbutenhof commented 1 year ago

PBENCH-1120

A SQL error was observed in deployment where pbench-index logged an error on the INDEX sync object because a tarball was somehow not present. The message string generated by indexing_tarballs.py exceeded the VARCHAR(255) column specification.

This isn't an attempt to address the root problem, but to address the symptom of overloading the operation table message column in the future so at least errors are properly recorded.

This reworks some of the indexing_tarballs.py messages to avoid redundancy (e.g., naming the dataset or tarball isn't necessary as the records are linked to the Dataset), but also removes the limit on the message column as a precaution.

(NOTE: it also adds some unit test cases, although these are more documentation than "real tests" as sqlite3, unlike PostgreSQL, doesn't implement column limits.)

Resolves #3366

riya-17 commented 1 year ago

Hey @dbutenhof just to understand this fix. We have created a separate column for the messages which doesn't have any limit, right? unrelated to this change where were the messages stored earlier?

dbutenhof commented 1 year ago

Hey @dbutenhof just to understand this fix. We have created a separate column for the messages which doesn't have any limit, right? unrelated to this change where were the messages stored earlier?

All this PR does is expand the existing column, and "tweak" some of the messages to avoid unnecessarily long values. The column isn't added, or moved, and none of the Sync logic has changed.

When I added the Sync mechanism to fix the race conditions I added when I got rid of the filesystem state links, I ended up with the operations table which atomically tracks the status of the various operations we can perform on datasets, to ensure that the universe happens once and only once, in order. More or less on a whim, and partially to help with debugging, I added a message column that could be set to record unusual status. I didn't really think at that time that we might end up generating long messages, which are causing problems now in the production environment. This tries to bring that back under control.