Closed doulikecookiedough closed 1 month ago
Update:
This issue appears to be resolved with the solution of checking for the existence of a cid refs file
at the time of, rather than predetermining its existence and moving forward in an elif statement that evaluates booleans. Testing has gone well with 100000 objects from knbvm/test.arcticdata.io
- both pid
and cid refs files
match those generated from the java library.
While the current fix has resolved the problems described in this issue, it does not mean the methods are completely thread/multiprocessing safe. As observed in the debugging-level logs below, two processes can still enter a shared list to check for a cid
, which will cause one cid refs file
to replace another, which explains why some pids
were initially missing from the cid refs file
.
Investigation is ongoing:
tag_object
calls happen consecutively - which minimizes the chances of two threads/processes showing the behaviour below).condition
- which seems more appropriate.# Three processes attempt to tag the cid
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object: Tagging object cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid: arctic-data.6186.1.
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object: Tagging object cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid: arctic-data.6188.1.
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object: Tagging object cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid: arctic-data.6190.1.
# Two processes hit the synchronization block at the same time, and both passed
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object (mp): Locking cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f to to tag pid: arctic-data.6186.1.
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object (mp): Locking cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f to to tag pid: arctic-data.6188.1.
# The other thread wasn't as fast, and is waiting
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object (mp): (cid) a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f is currently locked. Waiting.
# Temp files are created
2024-05-27 13:04:33 - DEBUG - FileHashStore - _write_refs_file: Writing id (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) into a tmp file in: /home/mok/testing/knbvm_python/refs/tmp
2024-05-27 13:04:33 - DEBUG - FileHashStore - _write_refs_file: Writing id (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) into a tmp file in: /home/mok/testing/knbvm_python/refs/tmp
# The cid refs files here overwrite one another with the one above
# The first pid is tagged
2024-05-27 13:04:33 - DEBUG - FileHashStore - _verify_hashstore_references: verifying pid (arctic-data.6186.1) and cid (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) refs files. Additional Note: Reference files have been moved to their permanent location.
2024-05-27 13:04:33 - INFO - FileHashStore - tag_object: Successfully tagged cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid arctic-data.6186.1
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object (mp): Removing cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f from reference_locked_cids.
# The second pid is tagged
2024-05-27 13:04:33 - DEBUG - FileHashStore - _verify_hashstore_references: verifying pid (arctic-data.6188.1) and cid (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) refs files. Additional Note: Reference files have been moved to their permanent location.
2024-05-27 13:04:33 - INFO - FileHashStore - tag_object: Successfully tagged cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid arctic-data.6188.1
2024-05-27 13:04:33 - DEBUG - FileHashStore - tag_object (mp): Removing cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f from reference_locked_cids.
# The third process gets in
2024-05-27 13:04:34 - DEBUG - FileHashStore - tag_object (mp): Locking cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f to to tag pid: arctic-data.6190.1.
2024-05-27 13:04:34 - DEBUG - FileHashStore - tag_object: pid refs file does not exists for pid arctic-data.6190.1 but cid refs file exists at: /home/mok/testing/knbvm_python/refs/cids/a8/2e/b0/67ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f for cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f
# We only write the pid refs file since cid refs exists
2024-05-27 13:04:34 - DEBUG - FileHashStore - _write_refs_file: Writing id (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) into a tmp file in: /home/mok/testing/knbvm_python/refs/tmp
2024-05-27 13:04:34 - DEBUG - FileHashStore - _verify_hashstore_references: verifying pid (arctic-data.6190.1) and cid (a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f) refs files. Additional Note: Pid refs file doesn't exist, but cid refs exists.
2024-05-27 13:04:34 - INFO - FileHashStore - tag_object: Successfully updated cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f with pid: arctic-data.6190.1
2024-05-27 13:04:34 - DEBUG - FileHashStore - tag_object (mp): Removing cid: a82eb067ca0959f287d95eb4b43de10a6fd8f003859533cbcb01860005ac6a8f from reference_locked_cids.
This issue has been resolved via BugFix-97: tag_object cid refs file missing pids.
The solution involves using a threading
/multiprocessing
Condition, which has a built-in lock (or you can pass one to it), and coordinates all thread/processes - similar to Java's synchronized
block.
Also note - pytests
now run almost 50% faster (from ~14s to ~8s locally). Testing is ongoing.
The object associated with
arctic-data.6188.1
exists, along with acid_refs_file
, but thecid_refs_file
is not tagged witharctic-data.6188.1
. There appears to be a syncing issue when tagging objects. This occurred approximately 5 times when attempting to store the first 100000 data objects fromtest.arcticdata.io
that are less than 1gb in size. Investigate thetag_object
process.Example Log for pid
arctic-data.6188.1
The
cid refs file
affected contains two pids, but notarctic-data.6188.1
which was properly stored/verified, but somehow dropped:Example Log 2 for pid
tao.14330.1
The
cid refs file
content at path/home/mok/testing/knbvm_python2/refs/cids/96/bb/d5/de61f36b0e10c5771d180998d066192e8986aa34a8cb7c453f62959274
: