Use StorageLib to download dependencies

Signed-off-by: Ahmed Hussein ahussein@nvidia.com

Fixes #1364, Contributes to #1359

This pull request includes updates to dependencies, improvements to the dependency caching process, and some code cleanups in the user_tools module. The most important changes include updating several dependencies, enhancing the verification process for dependencies, and refactoring the code to remove unused imports and improve readability.

Use the CspPath and CspFs to manage dependencies
This allows more flexibility in specifying custom dependencies including local disk storage.
Remove Pricing catalog from python package

Dependency Updates:

Updated fastcore to version 1.7.10 in user_tools/pyproject.toml.
Updated pydantic to version 2.9.2 in user_tools/pyproject.toml.
Added flake8-pydantic and pylint==3.2.7 to the optional test dependencies in user_tools/pyproject.toml.

Dependency Verification Enhancements:

Replaced direct hash and size checks with a verification object in various configuration files (databricks_aws-configs.json, databricks_azure-configs.json, dataproc-configs.json). [1] [2] [3]
Updated the cache_single_dependency method to use the new verification process and refactored the method for better readability in user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py.

Code Cleanups:

Removed unused imports from user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py. [1] [2]
Replaced no_prefix with no_scheme in _get_hadoop_classpath and _process_output_args methods in user_tools/src/spark_rapids_pytools/rapids/rapids_job.py and rapids_tool.py. [1] [2]

These changes enhance the dependency management and verification processes, improve code quality, and ensure the project uses up-to-date libraries.

How to use new utils:

def main():
    downloader3 = DownloadTask(src_url='https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.2/rapids-4-spark-tools_2.12-24.08.2.jar',
                               dest_folder='file:///var/tmp/spark_cache_folder_test',
                               verification={'size': 3265394})
    downloader3.run_task()

    TypeAdapter(CspFileChecker).validate_python({
        'file_path': 'file:///var/tmp/spark_cache_folder_test/rapids-4-spark-tools_2.12-24.08.2.jar',
        'must_exist': True,
        'size': 3265393,
        'extensions': ['jar']})

    TypeAdapter(CspFileChecker).validate_python({
        'file_path': 'file:///var/tmp/spark_cache_folder_test/rapids-4-spark-tools_2.12-24.08.2.jar',
        'must_exist': False,
        'size': 3265393,
        'extensions': ['jar']})

    TypeAdapter(CspFileChecker).validate_python({
        'file_path': 'file:///var/tmp/spark_cache_folder_test/rapids-4-spark-tools_2.12-24.08.2.jar',
        'must_exist': False,
        'size': 3265393,
        'extensions': ['jar']})

    hash_verifier = FileHashAlgorithm(HashAlgorithm('md5'), 'a64bc5ba6bd8790c08744343224e5dee')
    hash_verifier.verify_file(LocalPath('file:///var/tmp/spark_cache_folder_test/rapids-4-spark-tools_2.12-24.08.2.jar'))

    hash_verifier2 = FileHashAlgorithm(HashAlgorithm('sha1'), '846a957d888b11d147cb2922c6f43274c670b98b')
    hash_verifier2.verify_file(LocalPath('file:///var/tmp/spark_cache_folder_test/rapids-4-spark-tools_2.12-24.08.2.jar'))

    DownloadManager(
        [DownloadTask(src_url='https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.2/rapids-4-spark-tools_2.12-24.08.2.jar',
                      dest_folder='file:///var/tmp/spark_cache_folder_test/async',
                      configs={'forceDownload': True},
                      verification={'size': 3265393}),
         DownloadTask(src_url='https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.1/rapids-4-spark-tools_2.12-24.08.1.jar',
                      dest_folder='file:///var/tmp/spark_cache_folder_test/async',
                      configs={'forceDownload': True},
                      verification={'file_hash': FileHashAlgorithm(HashAlgorithm('md5'), 'bc9bf7fedde0e700b974426fbd8d869c')}),
         DownloadTask(src_url='file:///home/user/rapids-tools-1359/user_tools/src/spark_rapids_tools/cmdli/storage_cli.py',
                      dest_folder='file:///var/tmp/spark_cache_folder_test/async'),
         DownloadTask(src_url='https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz',
                      dest_folder='file:///var/tmp/spark_cache_folder_test/async',
                      configs={'forceDownload': False},
                      verification={
                          'file_hash': FileHashAlgorithm(
                              HashAlgorithm('sha512'),
                              '8883c67e0a138069e597f3e7d4edbbd5c3a565d50b28644aad02856a1ec1da7cb92b8f80454ca427118f69459ea326eaa073cf7b1a860c3b796f4b07c2101319'
                          )})
         ]
    ).submit()

    new_untar_folder = untar_file(CspPath('file:///var/tmp/spark_cache_folder_test/async/spark-3.5.0-bin-hadoop3.tgz'),
                                  LocalPath('file:///var/tmp/spark_cache_folder_test/async/decompressed6'))

NVIDIA / spark-rapids-tools