python script and S3 image download

aeteamdev commented 1 year ago

Hello everybody, many thanks in advance for your time. I'm a bit stumbled with this issue: on a linux container (so, no Arc involved), I'm trying to write a python script which should be able to find images in a S3 folder, process them, and upload the result in another folder in the same(?) S3 bucket. What I did was starting from OptimizeRasters/CodeSamples/validatingCredentialsUsingUI.py, trying then to build on processUsingAnInputFolder.py. I just really need a most simple stub, so that once I have input S3 folder I can download files, process them and upload them. Maybe I got this all wrong, and I have to physically download the files via boto3, process them, and upload via boto3: "automatic" download and upload are just for Arc products? Oh, I wonder if I met any of you in Berlin last November: great sessions and very nice people. Thank you

Chamlika commented 1 year ago

@aeteamdev Are you planning to create an OptimizeRasters docker container so that you can scale to process imagery locally. One possible approach is to use the OR AWS lambda version that's already available to the public. You can look into its source on GitHub within the setup folder. However if the use of AWS services is not an option, you can proceed to develop your own container with OR source to sit locally. Moreover, We intend to release the OR lambda container version to public as well to make developers and users alike to allow OR lambda container plug-in easily with any cloud provider out there, so there's something to look forward to once we get few docs on that finalized. In the meantime however, let us know if the container approach is a must for you else you could resort to using the OptimizeRasters directly to achieve what's required, thanks!

aeteamdev commented 1 year ago

@Chamlika Ah! Thank you, very interesting suggestion. At this time, the idea is to containerize OR - from this repo - and add a "custom" script to manage a few things related to MRF, AWS and other tasks. The container would be a ring in a chain of operations. Furthermore, at this time, I'm not able to say it's plausible to use AWS Lambda solution - see, I'm not working on this alone and I'm not the one to make such a decision. That said, to answer to

let us know if the container approach is a must for you else you could resort to using the OptimizeRasters directly to achieve what's required

yes, container approach, at this moment, is required. Also, it would be very appreciated if you could help me with an ideal workflow as per original question: how do I pass S3 hosted images to OR?

Do I have to download the image via boto3 and process it?
Is there a way to build a path and pass it to OR?

For point 2: what should I pass to input and output keys?

args = {
        'input': # what goes here?
        'output': # what goes here?
        'subs': 'true',
        'config': '/Image_Mgmt_Workflows/OptimizeRasters/Templates/Imagery_to_MRF_LERC.xml'
}

app = OptimizeRasters.Application(args)

Thank you very much for your time - and patience.

Chamlika commented 1 year ago

@aeteamdev Do you have the AWS credential keys to access the resources on S3 for OR to process, or is your Linux VM where you want to run these tests attached to an AWS IAM Role by any chance. OR can either download the source files locally before processing using the flag -tempinput or can directly process the rasters using the prefix /vsicurl/ in case if the -inputprofile is given or /vsis3/ for IAM Role if using the flag -usetoken=true. Let us know how the permissions are setup to access the S3 resources, thanks!

Chamlika commented 1 year ago

@aeteamdev You can take a look at the attached modified sample processUsingAnInputFolder.py to see how to input S3 data to processed locally. processUsingAnInputFolder.zip

aeteamdev commented 1 year ago

Hello @Chamlika! Thank you! I'll answer ~point by point:

Do you have the AWS credential keys to access the resources on S3 for OR to process, or is your Linux VM where you want to run these tests attached to an AWS IAM Role by any chance.

So, I think it's a mix of the two. I run this container, which contains all the dependencies needed to transform to MRF: OR, .aws files needed for authentication (AWS credential keys from a IAM Role to access the resources on S3), the script and the rest. But this is the development stage of the workflow, we're actually building the thing these days; I also think that the actual configuration won't mirror the final implementation, which I cannot exclude will be a VM attached to an IAM Role - as you suggest. But as I said in an earlier post, this process is just a ring in a chain of operations - and some further analysis is due.

OR can either download the source files locally before processing using the flag -tempinput or can directly process the rasters using the prefix /vsicurl/ in case if the -inputprofile is given or /vsis3/ for IAM Role if using the flag -usetoken=true.

I've seen this in action thanks to the example you kindly posted here. I just filled in the values for each key in the example - I didn't go with tempinput, rather kept with clouddownload - , and here we go: it works flawlessly. This solved many a issue I had in terms of configurability and sensible coding.

Let us know how the permissions are setup to access the S3 resources, thanks!

I hope I answered above: let me know if that's not the case.

What you've given to me, it fully answers my question. So I'll wait to hear from you again, just in case you have more questions for me. I find this exchange very useful: I'll close the thread when our dialog will come to its natural end, so to say.

Thank you again, have a nice day.

Chamlika commented 1 year ago

@aeteamdev Great! I'm glad you're able to move ahead with your work with the information that was exchanged. Let us now if you come across any other needs going forward. Yes, you may close the thread and reopen when necessary.

Thanks!

Esri / OptimizeRasters

python script and S3 image download #152