datakaveri / ogc-resource-server

OGC compliant IUDX based resource server serving Geospatial data
Apache License 2.0
0 stars 6 forks source link

OGC Tiles API- Actual Tiles Onboarding Process #192

Open code-akki opened 2 months ago

code-akki commented 2 months ago

A few open questions-

ThorodanBrom commented 1 month ago
DivyaSreeMunagavalasa commented 1 month ago

Tiles Onboarding Process

The tiles onboarding process is to facilitate the optimisation aspect of serving feature data in an efficient manner through visualization. If the collection already exists as a feature, the expected process flow begins with the feature onboarding process, followed by the actual tiles onboarding process. However, if the collection is a brand-new tile collection (i.e., it hasn't been onboarded as a feature previously), the tiles onboarding process should occur first, followed by the tiles metadata onboarding process.

There are total two main work flows for tiles onboarding process:

  1. Onboarding the tile collection when the collection is already present as feature
  2. Onboarding the brand new tile collection (previously collection doesn't exist as a feature)

The input json for tiles onboarding process can be same as feature onboarding process, something like this: { "inputs": { "fileName": "Process-test/Airports.gpkg", "resourceId": "7155c949-e4d3-478f-a3d4-20c54a796b8d", } }

Onboarding a Tile Collection when the Collection Already Exists as a Feature Steps:

There are two interdependencies of processes for tile onboarding process

Given the current time frame, the plan is to integrate these two processes into the tiles onboarding flow at a later stage because we still need to figure out how to integrate multiple processes.

DivyaSreeMunagavalasa commented 2 weeks ago

I explored a method to upload multiple files, or even an entire folder, to S3 with a Pre-Signed POST URL. This approach uses AWS's starts-with condition which enables flexible file naming by allowing any object key that begins with a specified prefix. For instance, setting a prefix of "uploads/". This means we can upload multiple files with keys like "uploads/file1.png", "uploads/folder1/file2.png", and "uploads/folder2/subfolder/file3.png" without generating a new pre-signed URL for each individual file. This flexibility makes it easier to upload large directories with subdirectories using a single pre-signed URL.

Implementation Details

  1. Generate Pre-Signed POST URL Fields:
    • I generated a single set of pre-signed POST URL fields. This includes:
      • Bucket name: name of S3 bucket
      • Policy document with a starts-with condition to limit uploads to paths beginning with "uploads/".
      • AWS credentials, date, and algorithm for request signing. The use of "starts-with" ensures that any object key beginning with "uploads/" is valid, allowing flexible, structured uploads.
    • Sample pre-signed fields: { "bucket": "s3-ap-south-1-test-ogc-tiles", "Policy": "eyAiZXhwaXJhdGlvbiI6...", "X-Amz-Date": "20241029T091716Z", "X-Amz-Algorithm": "AWS4-HMAC-SHA256", "X-Amz-Signature": "c8a3b29970...", "X-Amz-Credential": "AKIAROAMFPEM7ONJB54V/20241029/ap-south-1/s3/aws4_request", "key": "uploads/${filename}" }
  2. Bash Script for Directory Upload:
    • To automate uploads from a local directory, I created a Bash script that recursively finds each file in a specified local directory and uploads it to S3, preserving the directory structure.
    • For each file, the script constructs an S3 key based on its relative path and makes a POST request using curl.
    • The script simplifies the upload process by automating file-by-file uploads while adhering to the original folder structure.

UI Integration

DivyaSreeMunagavalasa commented 2 weeks ago

I also explored using AWS Transfer Manager to upload a directory to MinIO. After setting up the MinIO connection with access credentials, I configured the AmazonS3 client for path-style access (required by MinIO) and initialized the Transfer Manager. I specified our MinIO bucket, key prefix, and the local directory to upload. This approach successfully enabled efficient, multi-file directory uploads to MinIO, confirming compatibility with AWS Transfer Manager.

DivyaSreeMunagavalasa commented 2 weeks ago

I worked on exploring a way to integrate the UI with the generated Pre-Signed POST URL for Amazon S3 by using HTML and JavaScript code. This solution enables users to upload entire folders or multiple files to S3 through a web interface. By setting up a file picker, users can select directories for upload, and the script then dynamically creates S3 object keys for each file based on the relative paths, handling uploads automatically. The implementation minimizes the need for individual pre-signed URLs for each file by utilizing the starts-with condition within the AWS policy itself to allow all files under a specified prefix without generating new URLs for each item. Here’s a step-by-step outline of the functionality:

Enhancements To standardize the upload structure, the following validations should be added:

DivyaSreeMunagavalasa commented 3 days ago

Explored the ogr2ogr command to convert the feature collection table into MVT tiles. This can be done in two ways:

DivyaSreeMunagavalasa commented 3 days ago

So, now the two process flows have been explored.

Now if I have to start with the tiles onboarding process - there are few open questions---

  1. Do we need to have a separate processes for generating the post pre-signed url for uploading the brand new tile collection and already existing feature collection flow? Because generating the post pre-signed url for uploading the brand new tile collection should be a sync process as we have to provide the post url fields as the output of the process. And converting the existing collection feature into MVT tiles using the ogr2ogr command can be a async process. And also I think it is not feasible to integrate the post pre-signed url generation flow into the existing S3PreSignedURLGeneration process flow, because the existing process is meant for creating a pre-signed url for features collection (single object) based on the object key name (resourceGroup/resourceName.gpkg) which is getting constructed in the code itself -- that's why I'm thinking to have a separate process flow for generating a post pre-signed url for brand new tiles collection as the object key differs (resourceId/tileMatrixSet/) rather than integrating this post url flow into the existing s3PreSignedUrl flow.

  2. We need to trigger tiles meta data onboarding process right after tiles onboarding is completed. How to do this? I'm thinking to remove the separate process for tiles meta data onboarding process and integrate that process flow in the tiles onboarding flow itself. But here, we can do that for existing feature collection flow but for the brand new tiles collection - I don't know because we are only providing post pre signed url and we are not even sure if the tiles upload to S3 gets successful through UI (because of faulty tiles collection (I don't know, but for some reason, let's assume the upload may fail))- I'm thinking---- if the tiles upload is successful through UI- then through UI, we can trigger Tiles Meta data onboarding process by calling the tiles meta data onboarding process endpoint (for this , the tiles meta data onboarding process should be there separately- because I was telling before that the idea of removing separate process flow for tiles meta data onboarding)