trgiangdo commented 12 months ago

https://github.com/Avaiga/taipy/issues/408

This PR opens new .append() API for CSV, Excel, Json, MongoCollection, SQLTable data node types.

Notes on other data node types:

GenericDataNode, we can open a new attribute to let the user define their own append method. Let me know if we want to do this.
InMemoryDataNode, we don't know what kind of data stored in memory.
PickleDataNode, we don't know what kind of data stored in the pickle file.
SQLDataNode, the user can just modify the read query to append data.
ParquetDataNode, the current method is to read the data, append the data, then rewrite, which is slow and not working properly with modin. Using fastparquet, there is a simpler way to do this, but I'm working on finding a better way using pyarrow, which is our default engine.

github-actions[bot] commented 12 months ago

☂️ Python Cov

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status

8335 7902 95% 85% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status

src/taipy/core/config/checkers/_data_node_config_checker.py 98% 🟢

src/taipy/core/config/data_node_config.py 98% 🟢

src/taipy/core/data/_abstract_sql.py 89% 🟢

src/taipy/core/data/_data_converter.py 97% 🟢

src/taipy/core/data/csv.py 95% 🟢

src/taipy/core/data/data_node.py 99% 🟢

src/taipy/core/data/excel.py 96% 🟢

src/taipy/core/data/json.py 96% 🟢

src/taipy/core/data/mongo.py 90% 🟢

src/taipy/core/data/parquet.py 98% 🟢

src/taipy/core/data/sql.py 98% 🟢

src/taipy/core/data/sql_table.py 100% 🟢

src/taipy/core/exceptions/exceptions.py 92% 🟢

TOTAL 96% 🟢

updated for commit: 2682eb4 by action🐍

Lines	Covered	Coverage	Threshold	Status
8335	7902	95%	85%	🟢

File	Coverage	Status
src/taipy/core/config/checkers/_data_node_config_checker.py	98%	🟢
src/taipy/core/config/data_node_config.py	98%	🟢
src/taipy/core/data/_abstract_sql.py	89%	🟢
src/taipy/core/data/_data_converter.py	97%	🟢
src/taipy/core/data/csv.py	95%	🟢
src/taipy/core/data/data_node.py	99%	🟢
src/taipy/core/data/excel.py	96%	🟢
src/taipy/core/data/json.py	96%	🟢
src/taipy/core/data/mongo.py	90%	🟢
src/taipy/core/data/parquet.py	98%	🟢
src/taipy/core/data/sql.py	98%	🟢
src/taipy/core/data/sql_table.py	100%	🟢
src/taipy/core/exceptions/exceptions.py	92%	🟢
TOTAL	96%	🟢

jrobinAV commented 12 months ago

On SQLDataNode, I believe we need 2 different methods for writing and appending data. Writting will still need to write everything. The orchestrator will require that. Then the append method should be used for the manual edition. Same for read and filter...

trgiangdo commented 12 months ago

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

jrobinAV commented 12 months ago

According to https://docs.taipy.io/en/latest/manuals/core/config/data-node-config/#example-with-a-microsoft-sql-database-table_1, not really.

In that example, the write_query_builder() will return 2 queries, one delete all data, and the next one will write the data.

There is no way to know for sure which table to delete data from when writing SQLDataNode, so we can not force it I think

My point is that the SqlDataNode must expose two methods like any other Data node: write(all_data) and append(data).

The write basically executes the 2 SQL queries returned by the write_query_builder().
The append should execute one SQL query as well. This query can be different from the second query of the write_query_builder(). That means we need an append_query_builder(). Am I missing something?

trgiangdo commented 12 months ago

Oh, then it can be something like sql_dn.append(data, append_query_builder) where append_query_builder is a different method. The signature will be different from other data nodes though

jrobinAV commented 12 months ago

No, the append_query_builder can be set in the data node config, like the write_query_builder, but it will be executed at run time. Something like:

The configure method could look like:

sales_history_cfg = Config.configure_sql_data_node(
    id="sales_history",
    db_username="admin",
    db_password="password",
    db_name="taipy",
    db_engine="mssql",
    read_query="SELECT * from sales",
    write_query_builder=write_query_builder,
    append_query_builder=append_query_builder,
    db_driver="ODBC Driver 17 for SQL Server",
    db_extra_args={"TrustServerCertificate": "yes"},
)

The data node implementation would look like this:

    def _do_append(self, data) -> None:
        queries = self.properties.get(self._APPEND_QUERY_BUILDER_KEY)(data)
        for query in queries:
            connection.execute(query)

What do you think ?

trgiangdo commented 12 months ago

I see, that makes much more sense. Thank you.

trgiangdo commented 12 months ago

In a few latest commits:

Add append() method to ParquetDataNode. However, this only works if the fastparquet package is installed.
Add append() method to SQLDataNode, which uses the queries return by the append_query_builder provided at the config level.

Avaiga / taipy-core

Feature/#755 - Open append() method on datanodes #824

☂️ Python Cov

Overall Coverage

New Files

Modified Files