NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

test metadig-engine on k8s against a hashstore #453

Open jeanetteclark opened 2 weeks ago

jeanetteclark commented 2 weeks ago

Testing locally has gone well but it would be nice to test the engine against a hashstore on the dev cluster

to that end I've mounted the tdg subvolume on metadig-worker, and that subvolume was mounted on dev.nceas where there is a hashstore metacat running. See helm/metadig-worker/pv.yaml and helm/metadig-worker/pvc.yaml for details on the existing mounts.

In order to actually test though the following steps are needed:

doulikecookiedough commented 1 week ago

Update:

The rsync + parallel process to copy the contents of /var/metacat/hashstore to /mnt/tdg-repos/dev/metacat/hashstore has been completed.

Next Steps:

To Do List:

For reference:

# How to produce a text file with just the first level of hashstore folders to rsync
mok@dev:~/testing$ sudo find /var/metacat/hashstore -mindepth 1 -maxdepth 1 > mc_hs_dir_list.txt
mok@dev:~/testing$ cat mc_hs_dir_list.txt
/var/metacat/hashstore/objects
/var/metacat/hashstore/metadata
/var/metacat/hashstore/refs
/var/metacat/hashstore/hashstore.yaml

# How to use rsync with a list of folders
mok@dev:~/testing$ cat mc_hs_dir_list.txt | parallel --eta sudo rsync -aHAX {} /mnt/tdg-repos/dev/metacat/hashstore/
# First get the list of files found under `/hashstore`
mok@dev:~/testing$ sudo find /var/metacat/hashstore -type f -printf '%P\n' > mc_obj_list.txt

# How to feed a single command at a time for a file to rsync
# The /./ between `metacat` and `hashstore` instructs rsync to copie folders from hashstore (and omits the previous directories) into the desired folder
mok@dev:~/testing$ parallel --eta sudo rsync -aHAXR /var/metacat/./hashstore/{} /mnt/tdg-repos/dev/metacat :::: mc_obj_list.txt
doulikecookiedough commented 6 days ago

Metacat on dev.nceas.ucsb.edu has been moved over to write to the ceph fs mount point - a symlink has been created between /var/metacat/hashstore and /mnt/tdg-repos/dev/metacat/hashstore.

rsync was re-ran and the process to sync with a list of direct subfolders after /var/metacat/hashstore was the fastest. I tested with feeding rsync individual commands (ex. via :::: list_of_files.txt) but this seemed to be very slow. The re-sync process took approximately 5 minutes.

doulikecookiedough commented 22 hours ago

Current Status:

It appears the 'Assessment Reports' (Metadig) for datasets at dev.nceas.ucsb.edu is not working as expected:

Next Steps:

1) Restoring expected Metadig functionality @ dev.nceas.ucsb.edu

2) Obtaining the last missing feature-hashstore-support image for metadig-controller

3) Deploying feature-hashstore-support for Metadig in full on the dev cluster

To Do List & Follow-up Questions