Closed Forchapeatl closed 11 months ago
Hello @toan-quach and @jrobinAV , please permit me to tag you directly. For some reason, I can't seem to request for a review. Please have a look at my PR.
@Forchapeatl awesome!! Thanks for tagging us! We will review as soon as we can :D Thanks again for your help!
Hello @Forchapeatl.
Thank you very much for your contribution. For information, we are currently working on a significant repository refactoring. We are merging sub-repositories into a single repository. For instance, the taipy-core repository is about to be merged into the taipy repository.
This should not impact your work.
But just so you are not surprised, instead of merging your PR, we will move your code from this branch (in taipy-core) into a new branch of another repository (taipy). We will let you know and let you review the code.
Thank you for your understanding.
Hello @jrobinAV thank you for your kind review. The requested changes have been made. The S3ObjectDataNode
has been implemented and tested with the _read
and _write
fuctions in thier expected signatures.
The issue now is the _append
method. I had a chat with @trgiangdo concerning this method. I saw how this was possible when we were adding files to a single bucket. Now our DataNode is making reference to a single object not bucket. Handling this method should be possible by copying all the data to a new object, appendng the new content and writing back to S3 , but I don't think it's worth the effort in cases where the object data (file) is very large ( I may be wrong ) . Please for your opinion.
Thank you for your information. _append()
is an optional method, so if S3 does not support appending, then it's totally fine.
About the DataNodeConfig
, you probably want to take a look at this documentation page.
Then, you can take a look at the src/taipy/core/config/data_node_config.py
file, where we define different configuring methods for different data node types, and implement a new configure_s3_data_node()
method.
For now, the todo list:
configure_s3_data_node()
method to the DataNodeConfig
class_STORAGE_TYPE_VALUE_S3
attribute and add that attribute to the _ALL_STORAGE_TYPES
list.DataNodeConfig
docstring.Hello @jrobinAV thank you for your kind review. The requested changes have been made. The
S3ObjectDataNode
has been implemented and tested with the_read
and_write
fuctions in thier expected signatures.The issue now is the
_append
method. I had a chat with @trgiangdo concerning this method. I saw how this was possible when we were adding files to a single bucket. Now our DataNode is making reference to a single object not bucket. Handling this method should be possible by copying all the data to a new object, appendng the new content and writing back to S3 , but I don't think it's worth the effort in cases where the object data (file) is very large ( I may be wrong ) . Please for your opinion.
Hello,
Yes, I agree with you. Let's forget abt the append method. It is an optional method anyway.
Thank you so much.
Hello everyone, the requested changes have been made. This includes.
configure_s3_object_data_node()
method to the DataNodeConfig
classAdd _STORAGE_TYPE_VALUE_S3
attribute and add that attribute to the _ALL_STORAGE_TYPES
list.Please let me know if more changes are required.
#tests3.py
import taipy as tp
import boto3
from taipy import Config, Core
def build_message(s3_object,name):
return f"Hello {name}!"
s3_object_cfg = Config.configure_s3_object_data_node(
id="test_s3_object",
aws_access_key="ENTER YOUR AWS ACCESS_KEY",
aws_secret_access_key="ENTER YOUR AWS ACCESS_SECRET_KEY",
aws_s3_bucket_name="ENTER YOUR BUCKET NAME", #Must be Already existing bucket
aws_s3_object_key="taipy_object")
input_name_data_node_cfg = Config.configure_data_node(id="input_name")
message_data_node_cfg = Config.configure_data_node(id="message")
build_msg_task_cfg = Config.configure_task("build_msg", build_message, input=[s3_object_cfg,input_name_data_node_cfg], output=[message_data_node_cfg])
scenario_cfg = Config.configure_scenario("scenario", task_configs=[build_msg_task_cfg])
if __name__ == "__main__":
Core().run()
hello_scenario = tp.create_scenario(scenario_cfg)
myS3ObjectDataNode = hello_scenario.test_s3_object
myS3ObjectDataNode.write("This is nice , Hello , HI , WHAT? ,WHEN! Who?")
ans = myS3ObjectDataNode.read()
print(ans)
hello_scenario.input_name.write("Taipy")
hello_scenario.submit()
Hello everyone, things seem to be woring fine now. The S3ObjectDataNode
does get created. However it seems to be failing storage type test.
Is this a file or variablecaplog.text
? Please for more information on this.
My tests failes on these lines:
https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/test_core.py#L35
https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/config/test_data_node_config.py#L132
https://github.com/Forchapeatl/taipy-core/blob/57b29061626186aa4839528f8eaf81baac15cf5e/tests/core/config/checkers/test_data_node_config_checker.py#L94
Please bear with me on the tests3.py
code sample. The docs seems advanced for me.
The failed tests are because you updated the error message with new storage_type, but the test expected_error_message
on the tests are not updated.
A small update should fix the problem.
The failed tests are because you updated the error message with new storage_type, but the test
expected_error_message
on the tests are not updated.A small update should fix the problem.
Wow, thankyou @trgiangdo , I managed to pass 2 more test, any hints on this file ?
I believe it has the same problem. A small update on the expected_error_message
should fix it.
Passed all 1108 tests. I belive s3ObjectDataNode has been succesfully added and tested.
The tests are failing:
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/taipy/core/data/aws_s3.py", line 13, in <module>
import boto3
ModuleNotFoundError: No module named 'boto3'
> assert expected_error_message in caplog.text
E assert '`storage_type` field of DataNodeConfig `data_nodes` must be either csv, sql_table, sql, mongo_collection, pickle, excel, generic, json, parquet, s3_object, or in_memory. Current value of property `storage_type` is "bar".' in 'ERROR Taipy:config.py:237 `storage_type` field of DataNodeConfig `data_nodes` must be either csv, sql_table, sql, ..., pickle, excel, generic, json, parquet, in_memory, or s3_object. Current value of property `storage_type` is "bar".\n'
E + where 'ERROR Taipy:config.py:237 `storage_type` field of DataNodeConfig `data_nodes` must be either csv, sql_table, sql, ..., pickle, excel, generic, json, parquet, in_memory, or s3_object. Current value of property `storage_type` is "bar".\n' = <_pytest.logging.LogCaptureFixture object at 0x118d6c970>.text
The parquet, s3_object, or in_memory
is different from parquet, in_memory, or s3_object
Can you check again?
The tests are up and running. That's great 😁
Can you take a look at the linter error in https://github.com/Avaiga/taipy-core/actions/runs/7102562575/job/19333215991?pr=828? This is so that we have consistent code style in our project.
The tests are up and running. That's great 😁
Thank you. The code should be working fine now.
Well :) this is unexpected , @trgiangdo , I did run the pylint test as you said . I am abit reluctant to fix this error because , I am unable to reproduce the error on my environment , let me know what you think.
About the error
Running: pycodestyle --max-line-length=120 --ignore=E121,E123,E126,E226,E24,E704,W503,W504,E203 .
./tests/core/data/test_aws_s3_data_node.py:56:62: E231 missing whitespace after ','
./tests/core/data/test_aws_s3_data_node.py:76:61: E231 missing whitespace after ','
You just need to add a space after the comma in line 56 and line 76 and it should works
Hello @Forchapeatl.
FYI, this taipy-core repository is merged into the taipy repository and is going to be archived soon.
So, instead of merging your PR, can you create a new PR in the main taipy repository, so you can be counted as an official Taipy contributor by Github? The content of the PR should be very similar.
Migrated to https://github.com/Avaiga/taipy/pull/585
Added AWS S3 DataNode
Fixes Avaiga/taipy#536
Todo