Open goekesmi opened 1 month ago
For those that may come after me, relevant reading includes: https://github.com/TritonDataCenter/rfd/tree/626218753435ae1a5468c1de3bb59b1027057e0e/rfd/0143
Which lays out how manta_fastdelete_queue works in premise.
While working on my MantaV2, I noted an unexpected behavior in the garbage-collector. When replacing an object, no garbage was being collected. I have traced this to a set of functions that appear to have never been completed in this new context. Functional, but rough, patches have been created.
Test case:
If watching any part of the garbage-collection system (moray, postgres, the garbage-collection jobs) you should, and do see, an object get enqueued for deletion, and then deleted from the storage nodes.
Broken case:
If watching any part of the garbage-collection system (moray, postgres, the garbage-collection jobs) you should but do not see, an object get enqueued for deletion, and then deleted from the storage nodes. That object being the first time testfile was uploaded, now replaced by the second time testfile was uploaded.
Trace of the bug
Both of these paths are relying on the
post
function in the Manta's Moray, as documented at https://github.com/TritonDataCenter/moray/blob/master/docs/index.md#triggers . In my Manta's moray, I find:which has a post function, which is much easier to read here:
https://github.com/TritonDataCenter/node-libmanta/blob/a6e8094eed543afb5feddcf15d620d3e8dc06f78/lib/moray.js#L295
Relevant here is the line
https://github.com/TritonDataCenter/node-libmanta/blob/a6e8094eed543afb5feddcf15d620d3e8dc06f78/lib/moray.js#L333
which looks for the header
x-muskie-snaplinks-disabled
which then chooses to use themanta_fastdelete_queue
which is the expected way garbage collection works in MantaV2.So, implicitly this is getting set during the
mrm
operation, but not themput
operation, why is that? What even sets that?Again, node-libmanta.
https://github.com/TritonDataCenter/node-libmanta/blob/a6e8094eed543afb5feddcf15d620d3e8dc06f78/lib/moray.js#L991
but notably, only in the
delMetadata
function. delMetadata is called byhttps://github.com/TritonDataCenter/manta-muskie/blob/c9eec89d33d3aa86b1d372a6817c7288f4d46f75/lib/obj.js#L1036
muskie at
deletePointer
. This explains whymrm
works.So, why doesn't this work for the replacement operation?
Because
https://github.com/TritonDataCenter/manta-muskie/blob/c9eec89d33d3aa86b1d372a6817c7288f4d46f75/lib/obj.js#L640
saveMetadata doesn't call delMetadata. saveMetadata calls moray.putMetadata
https://github.com/TritonDataCenter/manta-muskie/blob/c9eec89d33d3aa86b1d372a6817c7288f4d46f75/lib/obj.js#L656
https://github.com/TritonDataCenter/node-libmanta/blob/a6e8094eed543afb5feddcf15d620d3e8dc06f78/lib/moray.js#L754
moray-libmanta's putMetadata has not handling for adding the header
x-muskie-snaplinks-disabled
at all.So, in the case where an object is replaced with another object, the header never gets attached, the post code takes the other path, and the delete get written to the log, rather than the fastdelete table.
Proposed solution:
Add options on the calls on the replacement path similar if not identical to the delete path.
I have two branches on the two projects that have a rough patch, which copies in the logic for snaplink detection and relaying that information to moray. They are
I am running this on one of my webapi instances via in place editing, and it does not appear to be malfunctioning. I have not setup a more extensive test environment.