OGC Tiles API- Actual Tiles Onboarding Process

datakaveri / ogc-resource-server

OGC compliant IUDX based resource server serving Geospatial data

Apache License 2.0

0 stars 6 forks source link

OGC Tiles API- Actual Tiles Onboarding Process #192

Open code-akki opened 2 months ago

code-akki commented 2 months ago

A few open questions-

For Vector Tiles generated using Feature Collections, we can be sure that an asset will be available in S3 as a geopackage, do we mandate the requirement that the asset must be a geopackage? (We can use ogr2ogr directly on the geopackage to generate the tiles)
Where should the source file be located? [Considering /stac/collections-asset might get obsolete]
What if the Tiles are already generated in the form of directory structure (think of OSM Map tiles) and is residing somewhere else, in which case we would need to either download the whole tileset or move it from one bucket to another?
Can we run ogr2ogr directly on the source geopackage remotely and generate the tiles on S3 to avoid downloading and uploading?

ThorodanBrom commented 1 month ago

Consider generating vector tiles from postgres DB feature table and directly write to S3 with ogr2ogr
See if pre-signed URL can upload a directory of files
- In case there is an existing set of raster/vector tiles

DivyaSreeMunagavalasa commented 1 month ago

Tiles Onboarding Process

The tiles onboarding process is to facilitate the optimisation aspect of serving feature data in an efficient manner through visualization. If the collection already exists as a feature, the expected process flow begins with the feature onboarding process, followed by the actual tiles onboarding process. However, if the collection is a brand-new tile collection (i.e., it hasn't been onboarded as a feature previously), the tiles onboarding process should occur first, followed by the tiles metadata onboarding process.

There are total two main work flows for tiles onboarding process:

Onboarding the tile collection when the collection is already present as feature
Onboarding the brand new tile collection (previously collection doesn't exist as a feature)

The input json for tiles onboarding process can be same as feature onboarding process, something like this: { "inputs": { "fileName": "Process-test/Airports.gpkg", "resourceId": "7155c949-e4d3-478f-a3d4-20c54a796b8d", } }

Onboarding a Tile Collection when the Collection Already Exists as a Feature Steps:

Check for Existing Collection: Verify if the collection exists in the collection_type table as a "FEATURE." If it is present, it confirms that feature onboarding has already occurred.
Convert Feature to Tile Collection: Using the provided file name from the input JSON (e.g., a .gpkg file in S3), employ ogr2ogr commands to convert the feature data into a tile collection.
Upload the Tile Collection to S3: Then upload the tile collection into S3 using ogr2ogr (still need to figure out, how to do this)

Onboarding a Brand-New Tile Collection (No Existing Feature) Steps:
Identify New Collection: If the collection is not present in collection_type table, then it can be considered as a brand new tile collection
Generate S3 Pre-Signed URL:
- Need to use S3 pre-signed url process to generate a link to upload tile collection to S3 (still need to figure out if we have to use existing S3 pre-signed url process or embed the logic in the process itself)

There are two interdependencies of processes for tile onboarding process

Tiles Meta Data onboarding process (This needs to be triggered after the actual tiles onboarding needs to be done)
Using S3 pre signed url process in the onboarding flow for brand new tile collection

Given the current time frame, the plan is to integrate these two processes into the tiles onboarding flow at a later stage because we still need to figure out how to integrate multiple processes.

DivyaSreeMunagavalasa commented 2 weeks ago

I explored a method to upload multiple files, or even an entire folder, to S3 with a Pre-Signed POST URL. This approach uses AWS's starts-with condition which enables flexible file naming by allowing any object key that begins with a specified prefix. For instance, setting a prefix of "uploads/". This means we can upload multiple files with keys like "uploads/file1.png", "uploads/folder1/file2.png", and "uploads/folder2/subfolder/file3.png" without generating a new pre-signed URL for each individual file. This flexibility makes it easier to upload large directories with subdirectories using a single pre-signed URL.

Implementation Details

Generate Pre-Signed POST URL Fields:
- I generated a single set of pre-signed POST URL fields. This includes:
  - Bucket name: name of S3 bucket
  - Policy document with a starts-with condition to limit uploads to paths beginning with "uploads/".
  - AWS credentials, date, and algorithm for request signing. The use of "starts-with" ensures that any object key beginning with "uploads/" is valid, allowing flexible, structured uploads.
- Sample pre-signed fields: { "bucket": "s3-ap-south-1-test-ogc-tiles", "Policy": "eyAiZXhwaXJhdGlvbiI6...", "X-Amz-Date": "20241029T091716Z", "X-Amz-Algorithm": "AWS4-HMAC-SHA256", "X-Amz-Signature": "c8a3b29970...", "X-Amz-Credential": "AKIAROAMFPEM7ONJB54V/20241029/ap-south-1/s3/aws4_request", "key": "uploads/${filename}" }
Bash Script for Directory Upload:
- To automate uploads from a local directory, I created a Bash script that recursively finds each file in a specified local directory and uploads it to S3, preserving the directory structure.
- For each file, the script constructs an S3 key based on its relative path and makes a POST request using curl.
- The script simplifies the upload process by automating file-by-file uploads while adhering to the original folder structure.

UI Integration

Currently, the script is designed for the shell environment. For a UI to implement this, it would need to handle file uploads individually, using JavaScript to simulate the Bash script’s logic.
Uploading entire directories through a web interface can be challenging as browsers need to upload each file individually while maintaining directory structure.

DivyaSreeMunagavalasa commented 2 weeks ago

I also explored using AWS Transfer Manager to upload a directory to MinIO. After setting up the MinIO connection with access credentials, I configured the AmazonS3 client for path-style access (required by MinIO) and initialized the Transfer Manager. I specified our MinIO bucket, key prefix, and the local directory to upload. This approach successfully enabled efficient, multi-file directory uploads to MinIO, confirming compatibility with AWS Transfer Manager.

DivyaSreeMunagavalasa commented 2 weeks ago

I worked on exploring a way to integrate the UI with the generated Pre-Signed POST URL for Amazon S3 by using HTML and JavaScript code. This solution enables users to upload entire folders or multiple files to S3 through a web interface. By setting up a file picker, users can select directories for upload, and the script then dynamically creates S3 object keys for each file based on the relative paths, handling uploads automatically. The implementation minimizes the need for individual pre-signed URLs for each file by utilizing the starts-with condition within the AWS policy itself to allow all files under a specified prefix without generating new URLs for each item. Here’s a step-by-step outline of the functionality:

Pre-Signed URL Fields Setup: Key parameters for the S3 upload are initialized in JavaScript, including the bucket name, AWS policy document, and credentials. The policy uses a starts-with condition, enabling uploads under a specified prefix (testBrowser/).
File Picker Configuration: Using an <input type="file" id="file-picker" name="fileList" webkitdirectory multiple /> element, users can select entire folders for upload. This enables structured, multi-file uploads from the UI.
Path Adjustment for Upload: The script extracts each file’s relative path, excluding the root folder, and constructs S3 object keys to maintain the directory structure.
Form Data Creation and Upload: Each file generates a FormData object containing required parameters, which is sent to S3 via the fetch API. Success and error messages for each upload provide real-time feedback.

Enhancements To standardize the upload structure, the following validations should be added:

Root Directory Validation: All root-level folders must have integer names (e.g., 0, 1, 2 and so on). Non-integer names should result in upload rejection.
Nested Directory Structure Check: Only integer-named directories are allowed within the root, preventing invalid nesting (e.g., osminio/tiles/osm_tiles/...).
File Naming and Extension Consistency: Subdirectories must contain only files with integer names and consistent extensions (e.g., all .png or all .pbf). If mixed extensions or non-integer names are detected, the upload fails with an error message.

DivyaSreeMunagavalasa commented 3 days ago

Explored the ogr2ogr command to convert the feature collection table into MVT tiles. This can be done in two ways:

Converting the feature collection table and generating the MVT file locally using ogr2ogr command and uploading the generated files to S3 using the AWS CLI.
- Commands:
  - Step 1: Create the MVT file locally ogr2ogr -f "MVT" /tmp/experiment PG:"host=<db-host> dbname=<db-name> user=<db-user> port=<port-number> password=<db-passwd>" -sql "SELECT * FROM \"61f2187e-affe-4f28-be0e-fe1cd37dbd4e\""
  - Step 2: Upload the local MVT files to S3 aws s3 cp /tmp/experiment s3://s3-ap-south-1-test-ogc-tiles/Process-test/experiment --recursive
  - This approach works but introduces additional overhead: downloading the tiles to the server and then uploading them to S3.
Setting the environment variable CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES that allows the MVT driver to use temporary local files during the conversion process but still write the final output directly to S3, eliminating the need for local file storage
- Commands:
- First, set the environment variable: export CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES
- Then execute the ogr2ogr command directly to S3: ogr2ogr -f "MVT" /vsis3/s3-ap-south-1-test-ogc-tiles/Process-test/experiment PG:"host=<db-host> dbname=<db-name> user=<db-user> port=<port-number> password=<db-passwd>" -sql "SELECT * FROM \"61f2187e-affe-4f28-be0e-fe1cd37dbd4e\"" --config AWS_S3_ENDPOINT <s3-end-point> --config AWS_ACCESS_KEY_ID <aws-access-key> --config AWS_SECRET_ACCESS_KEY <aws-secret-key>

DivyaSreeMunagavalasa commented 3 days ago

So, now the two process flows have been explored.

If the collection is already existing as a feature, we can convert the existing collection feature into MVT tiles using the ogr2ogr command.
If the collection is a brand new collection, we can generate a Post pre-signed url and send the post url fields as output, so that they will be used by UI to upload the entire tiles folder into S3.

Now if I have to start with the tiles onboarding process - there are few open questions---

Do we need to have a separate processes for generating the post pre-signed url for uploading the brand new tile collection and already existing feature collection flow? Because generating the post pre-signed url for uploading the brand new tile collection should be a sync process as we have to provide the post url fields as the output of the process. And converting the existing collection feature into MVT tiles using the ogr2ogr command can be a async process. And also I think it is not feasible to integrate the post pre-signed url generation flow into the existing S3PreSignedURLGeneration process flow, because the existing process is meant for creating a pre-signed url for features collection (single object) based on the object key name (resourceGroup/resourceName.gpkg) which is getting constructed in the code itself -- that's why I'm thinking to have a separate process flow for generating a post pre-signed url for brand new tiles collection as the object key differs (resourceId/tileMatrixSet/) rather than integrating this post url flow into the existing s3PreSignedUrl flow.
We need to trigger tiles meta data onboarding process right after tiles onboarding is completed. How to do this? I'm thinking to remove the separate process for tiles meta data onboarding process and integrate that process flow in the tiles onboarding flow itself. But here, we can do that for existing feature collection flow but for the brand new tiles collection - I don't know because we are only providing post pre signed url and we are not even sure if the tiles upload to S3 gets successful through UI (because of faulty tiles collection (I don't know, but for some reason, let's assume the upload may fail))- I'm thinking---- if the tiles upload is successful through UI- then through UI, we can trigger Tiles Meta data onboarding process by calling the tiles meta data onboarding process endpoint (for this , the tiles meta data onboarding process should be there separately- because I was telling before that the idea of removing separate process flow for tiles meta data onboarding)