chdb-io / chdb

chDB is an in-process OLAP SQL Engine πŸš€ powered by ClickHouse
https://clickhouse.com/docs/en/chdb
Apache License 2.0
2.03k stars 72 forks source link

UPDATE and DELETE have no effect #104

Closed Dennitz closed 6 months ago

Dennitz commented 1 year ago

Hey there, thanks for the amazing work on chdb!

I noticed that ALTER TABLE ... UPDATE and ALTER TABLE ... DELETE statements don't seem to have any effect in chdb. Data simply stays unchanged.

The same does work in clickhouse-local.

How to reproduce

The below examples can also be seen in this Colab notebook

UPDATE

from chdb import session as chs

sess = chs.Session()

sess.query("CREATE DATABASE test_db ENGINE = Atomic;")
sess.query("CREATE TABLE test_db.test_table (x String, y String) ENGINE = MergeTree ORDER BY tuple()")
sess.query("INSERT INTO test_db.test_table (x, y) VALUES ('A', 'B')")

print("Original values:")
print(sess.query("SELECT * FROM test_db.test_table"))

sess.query("ALTER TABLE test_db.test_table UPDATE y = 'updated' WHERE x = 'A' AND y = 'B'")

print('Values after UPDATE (expected "A", "updated"):')
print(sess.query("SELECT * FROM test_db.test_table"))

This prints:

Original values:
"A","B"

Values after UPDATE (expected "A", "updated"):
"A","B"

DELETE

from chdb import session as chs

sess = chs.Session()

sess.query("CREATE DATABASE test_db ENGINE = Atomic;")
sess.query("CREATE TABLE test_db.test_table (x String, y String) ENGINE = MergeTree ORDER BY tuple()")
sess.query("INSERT INTO test_db.test_table (x, y) VALUES ('A', 'B')")

print("Original values:")
print(sess.query("SELECT * FROM test_db.test_table"))

sess.query("ALTER TABLE test_db.test_table DELETE WHERE x = 'A' AND y = 'B'")

print("Values after DELETE (expected is no rows):")
print(sess.query("SELECT * FROM test_db.test_table"))

This prints:

Original values:
"A","B"

Values after DELETE (expected is no rows):
"A","B"

Expected behavior Data is updated / deleted, the same way as in ClickHouse / clickhouse-local.

Additional context

I tested the same queries as shown above in a clickhouse-local session, started with clickhouse local --path .. There it works as expected.

I can reproduce the issue with chdb on my M1 Mac and on Google Colab

l1t1 commented 12 months ago

sess.query("delete from test_db.test_table WHERE x = 'A' AND y = 'B'") hangs

djouallah commented 10 months ago

same problem with create or replace, it seems the issue is with the engine ENGINE = MergeTree Only Engine=Log seems to be working

auxten commented 10 months ago

Looking into that.

lmangani commented 10 months ago

Is this report validated with chdb==1.0.0rc2?

djouallah commented 10 months ago

Yes

djouallah commented 10 months ago

Let me try again maybe I need order by

auxten commented 10 months ago

I confirm that this is an issue caused by the delayed execution of DELETE and ALTER SQL in MergeTree family. I am currently finding the best way to handle it for chdb. Give me some time, no need to do further tests until 1.0.0rc3 released

djouallah commented 10 months ago

this one works fine for me, Engine=MergeTree ORDER BY tuple() , but I have to say clickhouse has a weird syntax :)

l1t1 commented 10 months ago

pypi.org isn't updated to 1.0rc3 now

auxten commented 10 months ago

Please wait for the build pipeline

l1t1 commented 10 months ago

Please wait for the build pipeline

seems failed https://github.com/chdb-io/chdb/actions/runs/7018226681

l1t1 commented 10 months ago

it's ok now

(base) root@e1a427840adf:/# pip install chdb==1.0.0.rc3 -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb==1.0.0.rc3
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e5/ae/710ed24e1118e91d47431c5ebcf47afbaeb1b869046b10391f3d3348e81a/chdb-1.0.0rc3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.6/99.6 MB 5.5 MB/s eta 0:00:00
Installing collected packages: chdb
  Attempting uninstall: chdb
    Found existing installation: chdb 1.0.0rc2
    Uninstalling chdb-1.0.0rc2:
      Successfully uninstalled chdb-1.0.0rc2
Successfully installed chdb-1.0.0rc3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
(base) root@e1a427840adf:/# 

(base) root@e1a427840adf:/# python
Python 3.11.4 (main, Jul  5 2023, 13:36:48) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from chdb import session as chs
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM
>>> 
>>> sess = chs.Session()
>>> 
>>> sess.query("CREATE DATABASE test_db ENGINE = Atomic;")
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM

>>> sess.query("CREATE TABLE test_db.test_table (x String, y String) ENGINE = MergeTree ORDER BY tuple()")
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM

>>> sess.query("INSERT INTO test_db.test_table (x, y) VALUES ('A', 'B')")
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM

>>> 
>>> print("Original values:")
Original values:
>>> print(sess.query("SELECT * FROM test_db.test_table"))
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM
"A","B"
>>> 
>>> sess.query("ALTER TABLE test_db.test_table UPDATE y = 'updated' WHERE x = 'A' AND y = 'B'")
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM

>>> 
>>> print('Values after UPDATE (expected "A", "updated"):')
Values after UPDATE (expected "A", "updated"):
>>> print(sess.query("SELECT * FROM test_db.test_table"))
Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM
"A","updated"
l1t1 commented 10 months ago

how to avoid output following message?

Lowered mark cache size to 3.76 GiB because the system has limited RAM
Lowered index mark cache size to 0.00 B because the system has limited RAM
lmangani commented 10 months ago

how to avoid output following message?

we're aware of this issue - those warnings come alongside results from the clickhouse engine itself

l1t1 commented 10 months ago

version 1.0.0 still has the RAM messages

auxten commented 10 months ago

will fix on v1.0.1