OpenPecha / prodigy-tools

Tools for OpenPecha's use of Prodigy
MIT License
0 stars 1 forks source link

do not store the full image in the database #5

Closed eroux closed 1 year ago

eroux commented 1 year ago

The current recipe stores the full images (in base64) in the database, but we don't want to do that for two reasons:

For that the stream could just use s3 URIs with a temporary token. That means that the image URIs will stop working at some point, but this will preserve the s3 key and we can reconstruct new s3 URIs later on

Zakongjampa commented 1 year ago

For this, the prodigy will accept an argument when we start the prodigy. We can pass --remove-base64 which will not save the base64 data of the image in the database. I hope this is helpful sir.

eroux commented 1 year ago

well, but then what will prodigy store? If it doesn't store the image url on s3, how can we reconstruct it from what's in the database?

ta4tsering commented 1 year ago

issue resolved, below is the log image using sudo journalctl -u prodigy_bdrc_crop_images.service

Screenshot 2023-01-18 at 12 11 45 PM

One Example of yielded dict to the stream as below: {image: https://s3.amazonaws.com/image-processing.bdrc.io/NLM1/W2KG208129/sources-web/W2KG208129-I2KG208175/I2KG2081750001.tif_19.png?AWSAccessKeyId=AKIASPOFTMDCENMI&Signature=p5C5atPCHw1SsyElCgh0ZFm00T4%3D&Expires=1674025970}

p.s : Image URL ExipresIn=36000 or in 10 hours

eroux commented 1 year ago

thanks! It looks good, is there a way to make it (much) longer than that, something like a year? If not we'll have to make a script that goes through the database and updates the URL of the images

ta4tsering commented 1 year ago

I don't know the upper limit to the ExpiresIn, I will check and update to the upper limit.

ta4tsering commented 1 year ago

As per this documentation Documentation Link nothing is mentioned about the maximum value for the ExpiresIn. But then as per this Documentation Documentation link it says maximum value is 7 days. But I put the ExpiresIn as one year or 31536000 seconds and restarted the server, to see if one year works and It works fine for now.

eroux commented 1 year ago

ok great! Let's create one Python script (independent from Prodigy) that does the following:

and then an "update-database.sh" script that:

ta4tsering commented 1 year ago

sure, will look into it

ta4tsering commented 1 year ago

here is the python script to update the s3_url or to update to extend the URL's ExpiresIn. update_s3_url.py

below is the test, in the test, I am assertting the status_code of the response of the new url. test_update_s3_url.py

and finally the update-database.sh

eroux commented 1 year ago

this is good! some remarks:

ta4tsering commented 1 year ago

okay, I will make those changes