DataONEorg / hashstore-java

HashStore, a hash-based object store for DataONE data packages
Apache License 2.0
1 stars 0 forks source link

`tagObject` produces an orphaned data object #97

Closed doulikecookiedough closed 1 month ago

doulikecookiedough commented 2 months ago

In the below proposed scenario raised by @artntek , tagObject could possibly produce an orphaned data object.

Thread A                    Thread B
========                    ========
   :                            :
uploads object X          uploads object X
 with pidX                   with pidX
   :                            :
(gets lock)                 (awaiting lock)
   :                            :
tagging obj X,                  :
FAILS ❌                        :
(returns cid lock)              :
   :                        (gets cid lock)
(returns pid lock)               :
   :                        (gets pid lock)
(awaiting lock)                 :
   :                        TAGS obj X ✅
   :                        (returns lock)
(gets lock)                     :
UNTAGS obj X ✅                 :
(returns lock)                  :
   :                            :
   :                            :
               DONE

In the example process shown, to add context, there are two paths in which the client could take. 1) storeObject(InputStream, pid, ...) where the pid is locked and tagObject is automatically called after 2) storeObject(InputStream), with no lock, then tagObject is called verifying the data object is valid

In Scenario 1)

In Scenario 2) (your described example)

Possible Solution:

Similar to how storeObject rejects duplicate store requests for a given pid, perhaps tagObject should also do the same. If a pid is in the process of being tagged (it's locked), any subsequent requests to tag a pid should be rejected.

There can only ever be 1 pid reference file, meaning the pid should only ever appear once in all the cid reference files as well.

Since we first lock the cid, then proceed to lock the pid, it sounds reasonable to reject a tagObject request if the pid is already in the process of being tagged.

We may have to re-organize the order, where we attempt to lock the pid first.

doulikecookiedough commented 2 months ago

To resolve the process, unTagObject will be extracted and moved into storeHashStoreRefsFiles. Expected exceptions will be re-thrown, and unexpected exceptions will trigger unTagObject before releasing the lock and bubbling up.

Thread A                    Thread B
========                    ========
   :                            :
uploads object X          uploads object X
 with pidX                   with pidX
   :                            :
(gets cid lock)         (awaiting lock)
(gets pid lock)                 :
   :                            :
tagging obj X,                  :
FAILS ❌                        :
catches exception               :
UNTAGS obj X ✅                 :
   :                            :
(returns cid lock)              :
   :                        (gets cid lock)
(returns pid lock)              :
   :                        (gets pid lock)
   :                            :
   :                        TAGS obj X ✅
   :                        (returns lock)
               DONE
doulikecookiedough commented 1 month ago

This has been completed via Bug-97: tagObject Produces Orphaned Object