add cvmfs-search tool to search cvmfs repos by a content hash in URL format.

Vbitz commented 6 months ago

cvmfs-search will automatically create a local SQLite database indexing a CVMFS repo by merging all the catalogs together. It can then be used to search for file paths with a given content hash in URL format (for instance gotten from server request logs).

Example: ./cvmfs-search http://stratum0.neurodesk.cloud.edu.au/cvmfs/neurodesk.ardc.edu.au/data/ce/5f9de12bc279218e151c1b9bf21a88ed048dad outputs containers/bidsapppymvpa_2.0.2_20230629/bidsapppymvpa_2.0.2_20230629.simg/usr/lib/python2.7/dist-packages/openpyxl/xml/tests/test_functions.py

If one content hash matches multiple files it will print all files.

It can take a while to index large repositories. Some of this is inherent with the amount of data being indexed but if you are in a hurry you can comment out the 2 CREATE INDEX statements. Once a repo is indexed it will keep using that index until the root catalog hash changes.

DrDaveD commented 6 months ago

Thanks for the contribution!

What do you think about adding an option to skip the two CREATE INDEX statements, including explaining what difference that makes in the usage message?

Please also add this tool to the list of tools installed in setup.py.

DrDaveD commented 6 months ago

Oh and for consistency with the other tool names, please change the name to cvmfs_search instead of cvmfs-search

Vbitz commented 6 months ago

Added

cvmfs-contrib / python-cvmfsutils

add cvmfs-search tool to search cvmfs repos by a content hash in URL format. #31