DataONEorg / hashstore-java

HashStore, a hash-based object store for DataONE data packages
Apache License 2.0
1 stars 0 forks source link

A .tmp file may not be deleted when storing the same data object numerous times #88

Closed doulikecookiedough closed 2 months ago

doulikecookiedough commented 2 months ago

There is a test deleteObject_1000Pids_1Obj_viaRunnable in FileHashStoreInterfaceTest class that intermittently fails (difficult to reproduce). This test stores 1000 pids for one data object, and then deletes the same 1000 pids.

It is expected that there are no temporary data objects or the data object itself after storage & deletion. The junit test asserts that there are no files in /objects. When the test fails, it appears that a .tmp file has been left over:

HashStoreRunnableTest ~ Path found in Objects Directory: /var/folders/0p/cx0c3pf10cl895pmqwb6hkrh0000gn/T/junit4247184260958857723/hashstore/objects/tmp/tmp-172064197201415622414281324181763278994.tmp

Investigate the tmp file clean up process

doulikecookiedough commented 2 months ago

I have reactivated this test with optimizations to the storeObject process via bug-88-tmpfile-cleanup, but am not completely confident that the issue has been resolved. This issue will remain open and investigation will continue if the junit test continues to fail.

doulikecookiedough commented 2 months ago

I was able to reproduce this issue, which was caused by a race condition in which two threads, with different pids but the same data object, completed writing a data object into a tmpFile, and both attempted to move the tmpFile to the permanent location at the same time.

This was resolved by synchronizing the cid value before attempting to move the tmpFile, allowing the putObject method to coordinate checking for a duplicate object before any move attempts are made.