DataONEorg / hashstore

HashStore, a hash-based object store for DataONE data packages
Apache License 2.0
1 stars 1 forks source link

Revise HashStore storeObject process #73

Closed doulikecookiedough closed 7 months ago

doulikecookiedough commented 8 months ago

Currently, when a Metacat client uploads a file, the form, metadata and stream to the object itself can come in any order. As a result, if the stream arrives first (before the form which contains the pid), we will be unable to call any storeObject and its overload methods.

Investigate how we could potentially revise our process, discuss the solution(s) with Jing and double check with the backend team. Once a solution has been accepted, implement the changes here and in HashStore-java.

doulikecookiedough commented 8 months ago

To Review with @taojing2002 :

In this revised approach, I propose that we:

1) Store objects with their content identifier (cid) as the permanent address

Details:

1) What is the new storeObject process?

2) How does the refs keep track of pids and prevent accidental deletions?

3) What happens if we are trying to write a reference file and delete it at the same time?

doulikecookiedough commented 8 months ago

The comment above has been updated after discussing with @taojing2002. Note, we must clarify how the Public API will change regarding storeObject() and whether the new process becomes the norm, with the existing methods becoming the exception (one could use it, but the Metacat client won't). Alternatively, we can remove the existing storeObject() methods and only have one process to store an object. Lastly, we must carefully implement and test that the locking process.

I will first implement these changes in Python before moving onto Java.

doulikecookiedough commented 7 months ago

This has been completed via Feature-73: store_object Refactor (with References)

Additional testing to be done during cross-language testing with HashStore-java's refactor/implementation.