Closed eroux closed 1 year ago
For this, the prodigy will accept an argument when we start the prodigy. We can pass --remove-base64 which will not save the base64 data of the image in the database. I hope this is helpful sir.
well, but then what will prodigy store? If it doesn't store the image url on s3, how can we reconstruct it from what's in the database?
issue resolved, below is the log image using sudo journalctl -u prodigy_bdrc_crop_images.service
One Example of yielded dict to the stream as below:
{image: https://s3.amazonaws.com/image-processing.bdrc.io/NLM1/W2KG208129/sources-web/W2KG208129-I2KG208175/I2KG2081750001.tif_19.png?AWSAccessKeyId=AKIASPOFTMDCENMI&Signature=p5C5atPCHw1SsyElCgh0ZFm00T4%3D&Expires=1674025970}
p.s : Image URL ExipresIn=36000 or in 10 hours
thanks! It looks good, is there a way to make it (much) longer than that, something like a year? If not we'll have to make a script that goes through the database and updates the URL of the images
I don't know the upper limit to the ExpiresIn, I will check and update to the upper limit.
As per this documentation Documentation Link nothing is mentioned about the maximum value for the ExpiresIn. But then as per this Documentation Documentation link it says maximum value is 7 days. But I put the ExpiresIn as one year or 31536000 seconds and restarted the server, to see if one year works and It works fine for now.
ok great! Let's create one Python script (independent from Prodigy) that does the following:
and then an "update-database.sh" script that:
sure, will look into it
here is the python script to update the s3_url or to update to extend the URL's ExpiresIn. update_s3_url.py
below is the test, in the test, I am assertting the status_code of the response of the new url. test_update_s3_url.py
and finally the update-database.sh
this is good! some remarks:
okay, I will make those changes
The current recipe stores the full images (in base64) in the database, but we don't want to do that for two reasons:
For that the stream could just use s3 URIs with a temporary token. That means that the image URIs will stop working at some point, but this will preserve the s3 key and we can reconstruct new s3 URIs later on