Open code-akki opened 2 months ago
ogr2ogr
Tiles Onboarding Process
The tiles onboarding process is to facilitate the optimisation aspect of serving feature data in an efficient manner through visualization. If the collection already exists as a feature, the expected process flow begins with the feature onboarding process, followed by the actual tiles onboarding process. However, if the collection is a brand-new tile collection (i.e., it hasn't been onboarded as a feature previously), the tiles onboarding process should occur first, followed by the tiles metadata onboarding process.
There are total two main work flows for tiles onboarding process:
The input json for tiles onboarding process can be same as feature onboarding process, something like this: { "inputs": { "fileName": "Process-test/Airports.gpkg", "resourceId": "7155c949-e4d3-478f-a3d4-20c54a796b8d", } }
Onboarding a Tile Collection when the Collection Already Exists as a Feature Steps:
Upload the Tile Collection to S3: Then upload the tile collection into S3 using ogr2ogr (still need to figure out, how to do this)
Onboarding a Brand-New Tile Collection (No Existing Feature) Steps:
There are two interdependencies of processes for tile onboarding process
Given the current time frame, the plan is to integrate these two processes into the tiles onboarding flow at a later stage because we still need to figure out how to integrate multiple processes.
I explored a method to upload multiple files, or even an entire folder, to S3 with a Pre-Signed POST URL
. This approach uses AWS's starts-with
condition which enables flexible file naming by allowing any object key that begins with a specified prefix. For instance, setting a prefix of "uploads/". This means we can upload multiple files with keys like "uploads/file1.png", "uploads/folder1/file2.png", and "uploads/folder2/subfolder/file3.png" without generating a new pre-signed URL for each individual file. This flexibility makes it easier to upload large directories with subdirectories using a single pre-signed URL.
Implementation Details
starts-with
condition to limit uploads to paths beginning with "uploads/".UI Integration
I also explored using AWS Transfer Manager to upload a directory to MinIO. After setting up the MinIO connection with access credentials, I configured the AmazonS3 client for path-style access (required by MinIO) and initialized the Transfer Manager. I specified our MinIO bucket, key prefix, and the local directory to upload. This approach successfully enabled efficient, multi-file directory uploads to MinIO, confirming compatibility with AWS Transfer Manager.
I worked on exploring a way to integrate the UI with the generated Pre-Signed POST URL for Amazon S3 by using HTML and JavaScript code. This solution enables users to upload entire folders or multiple files to S3 through a web interface. By setting up a file picker, users can select directories for upload, and the script then dynamically creates S3 object keys for each file based on the relative paths, handling uploads automatically. The implementation minimizes the need for individual pre-signed URLs for each file by utilizing the starts-with condition within the AWS policy itself to allow all files under a specified prefix without generating new URLs for each item. Here’s a step-by-step outline of the functionality:
Pre-Signed URL Fields Setup: Key parameters for the S3 upload are initialized in JavaScript, including the bucket name, AWS policy document, and credentials. The policy uses a starts-with condition, enabling uploads under a specified prefix (testBrowser/).
File Picker Configuration: Using an <input type="file" id="file-picker" name="fileList" webkitdirectory multiple />
element, users can select entire folders for upload. This enables structured, multi-file uploads from the UI.
Path Adjustment for Upload: The script extracts each file’s relative path, excluding the root folder, and constructs S3 object keys to maintain the directory structure.
Form Data Creation and Upload: Each file generates a FormData object containing required parameters, which is sent to S3 via the fetch
API. Success and error messages for each upload provide real-time feedback.
Enhancements To standardize the upload structure, the following validations should be added:
Root Directory Validation: All root-level folders must have integer names (e.g., 0, 1, 2 and so on). Non-integer names should result in upload rejection.
Nested Directory Structure Check: Only integer-named directories are allowed within the root, preventing invalid nesting (e.g., osminio/tiles/osm_tiles/...).
File Naming and Extension Consistency: Subdirectories must contain only files with integer names and consistent extensions (e.g., all .png or all .pbf). If mixed extensions or non-integer names are detected, the upload fails with an error message.
Explored the ogr2ogr command to convert the feature collection table into MVT tiles. This can be done in two ways:
Converting the feature collection table and generating the MVT file locally using ogr2ogr command and uploading the generated files to S3 using the AWS CLI.
ogr2ogr -f "MVT" /tmp/experiment PG:"host=<db-host> dbname=<db-name> user=<db-user> port=<port-number> password=<db-passwd>" -sql "SELECT * FROM \"61f2187e-affe-4f28-be0e-fe1cd37dbd4e\""
aws s3 cp /tmp/experiment s3://s3-ap-south-1-test-ogc-tiles/Process-test/experiment --recursive
Setting the environment variable CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES
that allows the MVT driver to use temporary local files during the conversion process but still write the final output directly to S3, eliminating the need for local file storage
export CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES
ogr2ogr -f "MVT" /vsis3/s3-ap-south-1-test-ogc-tiles/Process-test/experiment PG:"host=<db-host> dbname=<db-name> user=<db-user> port=<port-number> password=<db-passwd>" -sql "SELECT * FROM \"61f2187e-affe-4f28-be0e-fe1cd37dbd4e\"" --config AWS_S3_ENDPOINT <s3-end-point> --config AWS_ACCESS_KEY_ID <aws-access-key> --config AWS_SECRET_ACCESS_KEY <aws-secret-key>
So, now the two process flows have been explored.
Now if I have to start with the tiles onboarding process - there are few open questions---
Do we need to have a separate processes for generating the post pre-signed url for uploading the brand new tile collection and already existing feature collection flow? Because generating the post pre-signed url for uploading the brand new tile collection should be a sync
process as we have to provide the post url fields as the output of the process. And converting the existing collection feature into MVT tiles using the ogr2ogr command can be a async
process. And also I think it is not feasible to integrate the post pre-signed url generation flow into the existing S3PreSignedURLGeneration
process flow, because the existing process is meant for creating a pre-signed url for features collection (single object) based on the object key name (resourceGroup/resourceName.gpkg
) which is getting constructed in the code itself -- that's why I'm thinking to have a separate process flow for generating a post pre-signed url for brand new tiles collection as the object key differs (resourceId/tileMatrixSet/
) rather than integrating this post url flow into the existing s3PreSignedUrl flow.
We need to trigger tiles meta data onboarding process right after tiles onboarding is completed. How to do this? I'm thinking to remove the separate process for tiles meta data onboarding process and integrate that process flow in the tiles onboarding flow itself. But here, we can do that for existing feature collection flow but for the brand new tiles collection - I don't know because we are only providing post pre signed url and we are not even sure if the tiles upload to S3 gets successful through UI (because of faulty tiles collection (I don't know, but for some reason, let's assume the upload may fail))- I'm thinking---- if the tiles upload is successful through UI- then through UI, we can trigger Tiles Meta data onboarding process by calling the tiles meta data onboarding process endpoint (for this , the tiles meta data onboarding process should be there separately- because I was telling before that the idea of removing separate process flow for tiles meta data onboarding)
A few open questions-
ogr2ogr
directly on the geopackage to generate the tiles)ogr2ogr
directly on the source geopackage remotely and generate the tiles on S3 to avoid downloading and uploading?