Closed doulikecookiedough closed 7 months ago
To Review with @taojing2002 :
In this revised approach, I propose that we:
1) Store objects with their content identifier (cid
) as the permanent address
objectStream
)objectInfo
, String checksum
, String checksumAlgorithm
, long objSize
)cid
, String pid
)
3) Utilize a reference file to keep track of whether an object has multiple referencesThe reference files are stored with the same permanent address as the cid
in /refs
, following the HashStore config depth and width
Example folder layout for a single file stored along with its metadata and reference file
# Notes:
# - The reference for the pids contains the cid
# - The reference for the cids contain the pids that reference the cid
/objects
└─ /d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
/metadata
└─ /15/8d/7e/55c36a810d7c14479c9...b20d7df66768b04
/refs
└─ pids/0d/55/5e/d77052d7e166017f779...7230bcf7abcef65e
└─ cids/d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
hashstore.yaml
Details:
1) What is the new storeObject
process?
objectStream
)
objectInfo
, String checksum
, String checksumAlgorithm
, long objSize
)
cid
, String pid
) or deleteObject(String cid
, String pid
)
cid
cid
permanent address, then release the lock
validateObject
throws an exception, HashStore will call deleteObject(String cid
)
cid
, String pid
) is called.deleteObject
deleteObject
without a pid is possible, but will only proceed if there is an absence of a reference file2) How does the refs
keep track of pids and prevent accidental deletions?
Keeping track of references:
pid
to find the object in /refs/pids
which will contain a single /cid
cid
references in /refs/cids
If it doesn't, we write the pid
/refs
└─ pids/0d/55/5e/d77052d7e166017f779...7230bcf7abcef65e
└─ cids/d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
Content of refs/cids/d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
dou.test.1
j.tao.1700.1
j.tao.1700.1.2
deleteObject
is called, like tagObject
, we synchronize based on the given cid
cid
and confirm it only contains a single reference, and that the reference is the one given (sha256(pid)
) .
cid
from /objects
/refs
, and release the system-wide file lock3) What happens if we are trying to write a reference file and delete it at the same time?
cid
object lock will be shared between tagObject
and deleteObject
, so they must execute sequentiallyThe comment above has been updated after discussing with @taojing2002. Note, we must clarify how the Public API will change regarding storeObject()
and whether the new process becomes the norm, with the existing methods becoming the exception (one could use it, but the Metacat client won't). Alternatively, we can remove the existing storeObject()
methods and only have one process to store an object. Lastly, we must carefully implement and test that the locking process.
I will first implement these changes in Python before moving onto Java.
This has been completed via Feature-73: store_object
Refactor (with References)
Additional testing to be done during cross-language testing with HashStore-java
's refactor/implementation.
Currently, when a Metacat client uploads a file, the form, metadata and stream to the object itself can come in any order. As a result, if the stream arrives first (before the form which contains the
pid
), we will be unable to call any storeObject and its overload methods.Investigate how we could potentially revise our process, discuss the solution(s) with Jing and double check with the backend team. Once a solution has been accepted, implement the changes here and in
HashStore-java
.