Closed doulikecookiedough closed 2 months ago
This sounds like a great idea. Please do use "hard" links (and not "symbolic" links), which means that each file will be stored once and its old location and its new location both point at the same inode -- all hard links are truly equivalent pointers to the file content. Once you have all of the files hard-linked into hashstore, you will be able to remove the old file links without any loss of data, and the new file links will remain. One consideration is whether (or how) CephFS supports hard links -- I am pretty sure it does (@jeanetteclark used them IIRC), but there may also be some (major?) efficiency gains in doing this as requests to the Ceph MDS API rather than as POSIX filesystem calls.
I think this is the API for creating a hard link: https://docs.ceph.com/en/latest/cephfs/api/libcephfs-py/#cephfs.LibCephFS.link. There is also a method to create a symlink.
Thank you @mbjones for your feedback, suggestion and link to docs! The "hard" links direction gives me some peace of mind (as I was worried about what would happen if the original data/metadata objects were deleted if I have symlinks pointing to them). I will look into it and follow up if I have any other questions.
After further discussion, the process or a tool to convert existing Metacat /data
and /documents
directories into HashStore directories should be controlled by Metacat (and not a hashstoreclient
script or process). While HashStore can support this feature, Metacat should coordinate. A new issue(s) will be created in Metacat's repo after syncing up with the team.
To Discuss: Proposed Metacat HashStore Upgrade Process
1) Metacat will check postgres to see if it is fresh install or an upgrade, and suggest (force) an update
solr
setup process)
HashStore
pathsstoreHardLinksObject(String pid, Path object)
(TBD)storeHardLinksMetadata(String pid, Path object, String formatId)
(TBD)
Closing this issue - this utility method/process will not be part of the HashStore library. Metacat will fold this into its upgrade process (TBD).
In preparation for the
Metacat
3.1.0 release which will includeHashStore
, we will need a way to convert an existing/var/metacat/data
and/var/metadata/documents
directories intoHashStore
directories for the upgrade process. To be efficient with the conversion, instead of moving the files, we want to create symlinks to them instead.So the existing
/data
and/documents
directory is where the real copies of the "old" files exist. And new files/uploads will be stored into HashStore directly.To Do: