Avaiga / taipy

Turns Data and AI algorithms into production-ready web applications in no time.
https://www.taipy.io
Apache License 2.0
12.03k stars 843 forks source link

S3Object Data Node - possibility to change encoding #680

Open Forchapeatl opened 8 months ago

Forchapeatl commented 8 months ago
          # Failed Test : 🔥 

It does n't read on images , video and audio data . This is because we are reading uft-8encodings here which is fair :)

removing .decode('utf-8') will mean the s3ObjectDataNode.read() will have to be called as s3ObjectDataNode.read().decode('utf-8') when trying to access text files.

I don't think this s3ObjectDataNode.read().decode('utf-8') fuction call is the expected utility of the Datanode class read() method. Besides, this is an overkill !

I would like to propose that the write class method should enable one more property to the S3ObjectDataNode. This ACL - Permision management parameter. Adding a new property to the s3ObjectDataNode will solve this media object reading trouble. It will allow users to read s3 object media files by Url.

    __AWS_ACCESS_KEY_ID = "aws_access_key"
    __AWS_SECRET_ACCESS_KEY = "aws_secret_access_key"
    __AWS_STORAGE_BUCKET_NAME = "aws_s3_bucket_name"
    __AWS_S3_OBJECT_KEY = "aws_s3_object_key"
    __AWS_REGION = "aws_region"
+   __AWS_ACL= " ACL='private'|'public-read'|'public-read-write'|'authenticated-read'|'aws-exec-read'|'bucket-owner-read'|'bucket-owner-full-control"
    __AWS_S3_OBJECT_PARAMETERS = "aws_s3_object_parameters"
    def _write(self, data: Any):
        self._s3_client.put_object(
            Bucket=self.properties[self.__AWS_STORAGE_BUCKET_NAME],
            Key=self.properties[self.__AWS_S3_OBJECT_KEY],
            Body=data,
+           ACL= self.properties[self.__AWS_ACL],
        )

With the new property set topublic-read _( ACL = publicread ) the user will be able to access media files (image, vide , audio) files via url .


page ="""
data = "https://bealoabucket.s3.us-west-2.amazonaws.com/taipy.png"
 <|{data}|image|> 
"""

This means the read method will handle ASCII data only and the user will be free to read Media files via url.

I will like to leave this open for discussion. I just didn't want to forget , we can transfer this to a new issue when this PR is handled.

Originally posted by @Forchapeatl in https://github.com/Avaiga/taipy/issues/585#issuecomment-1849066118

trgiangdo commented 1 day ago

The problem now is that we have the decode("utf-8") directly in the read() method.

The decoding should not happens in the read method since we don't know for sure what is stored in the S3 bucket.

The best alternative is to let the developer handle the decoding him/herself, which mean the read() method should return the actual binary form of the read data.

We need to add tests to make sure that when the data is not text, but a file or a image, the correct binary form is returned.

jrobinAV commented 1 day ago

So, you propose to turn the S3ObjectDataNode into a fully generic data node. In addition, does it make sense to have an S3TextDataNode that reads the object and decodes it directly, so the main use case remains simple for the developer? What do you think?

Just like we have an SQLDataNode, which is completely generic, we also have an SQLTableDataNode to make it simple to read a table (the main usage).

trgiangdo commented 1 day ago

Store only text in a S3bucket is not really a popular practice. I would suggest keeping it generic for now, and we can expand the functionality in the future when it is required.