EOEPCA / eoepca-plus

EOEPCA+ deployments for development team
Apache License 2.0
4 stars 2 forks source link

Tell raster API to use cloudferro s3 #34

Closed j08lue closed 1 month ago

j08lue commented 1 month ago

This adds an environment variable to the raster API settings that should tell it to resolve s3:// URLs to CloudFerro S3.

Addresses

MathewNWSH commented 1 month ago

Hey @j08lue,

here is how I do it in CDSE:

os.environ['GDAL_HTTP_TCP_KEEPALIVE'] = "YES" os.environ['AWS_S3_ENDPOINT'] = "eodata.dataspace.copernicus.eu" os.environ['AWS_ACCESS_KEY_ID'] = "" os.environ['AWS_SECRET_ACCESS_KEY'] = "" os.environ['AWS_HTTPS'] = "YES" os.environ['AWS_VIRTUAL_HOSTING'] = "FALSE" os.environ['GDAL_HTTP_UNSAFESSL'] = "YES"

and this is how I do it from my machine on waw3-2 (should also work on fra1-2 cloud):

export AWS_S3_ENDPOINT=eodata.cloudferro.com
export AWS_ACCESS_KEY_ID=****
export AWS_SECRET_ACCESS_KEY=****
export CPL_CURL_VERBOSE=YES
export AWS_HTTPS=YES
export AWS_VIRTUAL_HOSTING=FALSE
export GDAL_HTTP_TCP_KEEPALIVE=YES
export CPL_VSIL_CURL_CHUNK_SIZE=98304
export GDAL_INGESTED_BYTES_AT_OPEN=16384
export GDAL_HTTP_UNSAFESSL=YES
export CPL_DEBUG=OFF
export GDAL_NUM_THREADS=-1
export PROJ_DEBUG=OFF

Some docs: S3 key gen and configuration of s3 on CDSE: https://documentation.dataspace.copernicus.eu/APIs/S3.html

Creodias (how to extract credentials and endpoint out of VM): https://creodias.docs.cloudferro.com/en/latest/eodata/How-to-get-credentials-used-for-accessing-EODATA-on-a-cloud-VM-on-Creodias.html#waw3-2-and-fra1-2-clouds-using-custom-and-default-linux-vms-executing-curl

And some of my code how I connect to our pgstac.demo.cloudferro.com using stackstac:

 os.environ['GDAL_HTTP_TCP_KEEPALIVE'] = "YES"
os.environ['AWS_S3_ENDPOINT'] = "eodata.dataspace.copernicus.eu"
os.environ['AWS_ACCESS_KEY_ID'] = ""
os.environ['AWS_SECRET_ACCESS_KEY'] = ""
os.environ['AWS_HTTPS'] = "YES"
os.environ['AWS_VIRTUAL_HOSTING'] = "FALSE"
os.environ['GDAL_HTTP_UNSAFESSL'] = "YES"

lon, lat = 14, 50

 # Main endpoint for STAC
URL='https://pgstac.demo.cloudferro.com/'
catalog = pystac_client.Client.open(URL)
catalog.add_conforms_to("ITEM_SEARCH")

items = catalog.search(
    collections=['sentinel-2-l2a'],
    intersects=dict(type="Point", coordinates=[lon, lat]),
    datetime="2024-01-01/2024-02-01",
    query = {"eo:cloud_cover":{"lte":50}}
).item_collection()

for x in items:
    for k, asset in list(x.assets.items()):
        if "alternate" in asset.extra_fields:
            asset.href = asset.extra_fields["alternate"]["s3"]["href"].replace('s3://', '/vsis3/')
        if '60m' not in k:
            del x.assets[k]

stack = stackstac.stack(items = items, resolution = (60, 60), bounds_latlon = (14.254, 50.014, 14.587, 50.133), chunksize = 98304, gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(
                               {'GDAL_NUM_THREADS': -1,
                                'GDAL_HTTP_UNSAFESSL': 'YES',
                                'GDAL_HTTP_TCP_KEEPALIVE': 'YES',
                                'AWS_VIRTUAL_HOSTING': 'FALSE',
                                'AWS_HTTPS': 'YES',
                                }),)

I hope this will help to get the idea of how to generate and pass env variables to raster api gdal backend.

MathewNWSH commented 1 month ago

@j08lue

If you are on Waw3-1 then thing looks a little bit different out there. Please let me know on which cloud are you on :)

gitguardian[bot] commented 1 month ago

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | | | -------------- | ------------------ | ------------------------------ | ---------------- | --------------- | -------------------- | | [10836075](https://dashboard.gitguardian.com/workspace/82784/incidents/10836075?occurrence=166248034) | Triggered | Generic High Entropy Secret | 218b01ec6527ce555efe07d32ed1d271526f7744 | argocd/eoepca/data-access/parts/values/values-eoapi.yaml | [View secret](https://github.com/EOEPCA/eoepca-plus/commit/218b01ec6527ce555efe07d32ed1d271526f7744#diff-22ff2ed6baaf1ad2def5dc0f031e2c2ad2dd34e75e9d17990f1a2e92489ef8ddR49) | | [10836075](https://dashboard.gitguardian.com/workspace/82784/incidents/10836075?occurrence=166248307) | Triggered | Generic High Entropy Secret | 8401e61c6c91af8483d63944006e12f8ca2397a9 | argocd/eoepca/data-access/parts/values/values-eoapi.yaml | [View secret](https://github.com/EOEPCA/eoepca-plus/commit/8401e61c6c91af8483d63944006e12f8ca2397a9#diff-22ff2ed6baaf1ad2def5dc0f031e2c2ad2dd34e75e9d17990f1a2e92489ef8ddR50) | | [10836075](https://dashboard.gitguardian.com/workspace/82784/incidents/10836075?occurrence=166248308) | Triggered | Generic High Entropy Secret | 8401e61c6c91af8483d63944006e12f8ca2397a9 | argocd/eoepca/data-access/parts/values/values-eoapi.yaml | [View secret](https://github.com/EOEPCA/eoepca-plus/commit/8401e61c6c91af8483d63944006e12f8ca2397a9#diff-22ff2ed6baaf1ad2def5dc0f031e2c2ad2dd34e75e9d17990f1a2e92489ef8ddL49) |
🛠 Guidelines to remediate hardcoded secrets
1. Understand the implications of revoking this secret by investigating where it is used in your code. 2. Replace and store your secrets safely. [Learn here](https://blog.gitguardian.com/secrets-api-management?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) the best practices. 3. Revoke and [rotate these secrets](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/generics/generic_high_entropy_secret#revoke-the-secret?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). 4. If possible, [rewrite git history](https://blog.gitguardian.com/rewriting-git-history-cheatsheet?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data. To avoid such incidents in the future consider - following these [best practices](https://blog.gitguardian.com/secrets-api-management/?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) for managing and storing secrets including API keys and other credentials - install [secret detection on pre-commit](https://docs.gitguardian.com/ggshield-docs/integrations/git-hooks/pre-commit?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) to catch secret before it leaves your machine and ease remediation.

🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.