Avaiga / taipy-core

A Python library to build powerful and customized data-driven back-end applications.
Apache License 2.0
38 stars 17 forks source link

Added aws s3 data node #828

Closed Forchapeatl closed 11 months ago

Forchapeatl commented 11 months ago

Added AWS S3 DataNode

image

Fixes Avaiga/taipy#536

Todo

Forchapeatl commented 11 months ago

Hello @toan-quach and @jrobinAV , please permit me to tag you directly. For some reason, I can't seem to request for a review. Please have a look at my PR.

toan-quach commented 11 months ago

@Forchapeatl awesome!! Thanks for tagging us! We will review as soon as we can :D Thanks again for your help!

jrobinAV commented 11 months ago

Hello @Forchapeatl.

Thank you very much for your contribution. For information, we are currently working on a significant repository refactoring. We are merging sub-repositories into a single repository. For instance, the taipy-core repository is about to be merged into the taipy repository.

This should not impact your work.

But just so you are not surprised, instead of merging your PR, we will move your code from this branch (in taipy-core) into a new branch of another repository (taipy). We will let you know and let you review the code.

Thank you for your understanding.

Forchapeatl commented 11 months ago

Hello @jrobinAV thank you for your kind review. The requested changes have been made. The S3ObjectDataNode has been implemented and tested with the _read and _write fuctions in thier expected signatures.

The issue now is the _append method. I had a chat with @trgiangdo concerning this method. I saw how this was possible when we were adding files to a single bucket. Now our DataNode is making reference to a single object not bucket. Handling this method should be possible by copying all the data to a new object, appendng the new content and writing back to S3 , but I don't think it's worth the effort in cases where the object data (file) is very large ( I may be wrong ) . Please for your opinion.

Reference: Append Data to S3 Object

trgiangdo commented 11 months ago

Thank you for your information. _append() is an optional method, so if S3 does not support appending, then it's totally fine.

trgiangdo commented 11 months ago

About the DataNodeConfig, you probably want to take a look at this documentation page.

Then, you can take a look at the src/taipy/core/config/data_node_config.py file, where we define different configuring methods for different data node types, and implement a new configure_s3_data_node() method.

For now, the todo list:

jrobinAV commented 11 months ago

Hello @jrobinAV thank you for your kind review. The requested changes have been made. The S3ObjectDataNode has been implemented and tested with the _read and _write fuctions in thier expected signatures.

The issue now is the _append method. I had a chat with @trgiangdo concerning this method. I saw how this was possible when we were adding files to a single bucket. Now our DataNode is making reference to a single object not bucket. Handling this method should be possible by copying all the data to a new object, appendng the new content and writing back to S3 , but I don't think it's worth the effort in cases where the object data (file) is very large ( I may be wrong ) . Please for your opinion.

Reference: Append Data to S3 Object

Hello,

Yes, I agree with you. Let's forget abt the append method. It is an optional method anyway.

Thank you so much.

Forchapeatl commented 11 months ago

Hello everyone, the requested changes have been made. This includes.

image

Please let me know if more changes are required.

Forchapeatl commented 11 months ago

image

image

#tests3.py
import taipy as tp
import boto3
from taipy import Config, Core

def build_message(s3_object,name):
    return f"Hello {name}!"

s3_object_cfg = Config.configure_s3_object_data_node(
    id="test_s3_object",
    aws_access_key="ENTER YOUR AWS ACCESS_KEY",
    aws_secret_access_key="ENTER YOUR AWS ACCESS_SECRET_KEY",
    aws_s3_bucket_name="ENTER YOUR BUCKET NAME", #Must be Already existing bucket
    aws_s3_object_key="taipy_object")

input_name_data_node_cfg = Config.configure_data_node(id="input_name")
message_data_node_cfg = Config.configure_data_node(id="message")
build_msg_task_cfg = Config.configure_task("build_msg", build_message, input=[s3_object_cfg,input_name_data_node_cfg], output=[message_data_node_cfg])
scenario_cfg = Config.configure_scenario("scenario", task_configs=[build_msg_task_cfg])

if __name__ == "__main__":

    Core().run()
    hello_scenario = tp.create_scenario(scenario_cfg)
    myS3ObjectDataNode = hello_scenario.test_s3_object
    myS3ObjectDataNode.write("This is nice , Hello , HI , WHAT? ,WHEN! Who?")
    ans = myS3ObjectDataNode.read()
    print(ans)

    hello_scenario.input_name.write("Taipy")
    hello_scenario.submit()

Hello everyone, things seem to be woring fine now. The S3ObjectDataNode does get created. However it seems to be failing storage type test.

image

Is this a file or variablecaplog.text ? Please for more information on this. My tests failes on these lines: https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/test_core.py#L35 https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/config/test_data_node_config.py#L132 https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/config/checkers/test_data_node_config_checker.py#L94

Please bear with me on the tests3.py code sample. The docs seems advanced for me.

trgiangdo commented 11 months ago

The failed tests are because you updated the error message with new storage_type, but the test expected_error_message on the tests are not updated.

A small update should fix the problem.

Forchapeatl commented 11 months ago

The failed tests are because you updated the error message with new storage_type, but the test expected_error_message on the tests are not updated.

A small update should fix the problem.

Wow, thankyou @trgiangdo , I managed to pass 2 more test, any hints on this file ?

trgiangdo commented 11 months ago

I believe it has the same problem. A small update on the expected_error_message should fix it.

Forchapeatl commented 11 months ago

image Passed all 1108 tests. I belive s3ObjectDataNode has been succesfully added and tested.

trgiangdo commented 11 months ago

The tests are failing:

Can you check again?

trgiangdo commented 11 months ago

The tests are up and running. That's great 😁

Can you take a look at the linter error in https://github.com/Avaiga/taipy-core/actions/runs/7102562575/job/19333215991?pr=828? This is so that we have consistent code style in our project.

Forchapeatl commented 11 months ago

The tests are up and running. That's great 😁

Thank you. The code should be working fine now.

Forchapeatl commented 11 months ago

Well :) this is unexpected , @trgiangdo , I did run the pylint test as you said . I am abit reluctant to fix this error because , I am unable to reproduce the error on my environment , let me know what you think. image

trgiangdo commented 11 months ago

About the error

Running: pycodestyle --max-line-length=120 --ignore=E121,E123,E126,E226,E24,E704,W503,W504,E203 .
./tests/core/data/test_aws_s3_data_node.py:56:62: E231 missing whitespace after ','
./tests/core/data/test_aws_s3_data_node.py:76:61: E231 missing whitespace after ','

You just need to add a space after the comma in line 56 and line 76 and it should works

trgiangdo commented 11 months ago

Hello @Forchapeatl.

FYI, this taipy-core repository is merged into the taipy repository and is going to be archived soon.

So, instead of merging your PR, can you create a new PR in the main taipy repository, so you can be counted as an official Taipy contributor by Github? The content of the PR should be very similar.

trgiangdo commented 11 months ago

Migrated to https://github.com/Avaiga/taipy/pull/585