Closed Vbitz closed 6 months ago
Thanks for the contribution!
What do you think about adding an option to skip the two CREATE INDEX
statements, including explaining what difference that makes in the usage message?
Please also add this tool to the list of tools installed in setup.py.
Oh and for consistency with the other tool names, please change the name to cvmfs_search instead of cvmfs-search
Added
cvmfs-search
will automatically create a local SQLite database indexing a CVMFS repo by merging all the catalogs together. It can then be used to search for file paths with a given content hash in URL format (for instance gotten from server request logs).Example:
./cvmfs-search http://stratum0.neurodesk.cloud.edu.au/cvmfs/neurodesk.ardc.edu.au/data/ce/5f9de12bc279218e151c1b9bf21a88ed048dad
outputscontainers/bidsapppymvpa_2.0.2_20230629/bidsapppymvpa_2.0.2_20230629.simg/usr/lib/python2.7/dist-packages/openpyxl/xml/tests/test_functions.py
If one content hash matches multiple files it will print all files.
It can take a while to index large repositories. Some of this is inherent with the amount of data being indexed but if you are in a hurry you can comment out the 2
CREATE INDEX
statements. Once a repo is indexed it will keep using that index until the root catalog hash changes.