RedBeardLab / rediSQL

Redis module that provides a completely functional SQL database
https://redisql.com
Other
1.55k stars 150 forks source link

the lock issue, when streaming data write into a table, run the long-time sql #69

Closed xiao321 closed 4 years ago

xiao321 commented 5 years ago

https://github.com/RedBeardLab/rediSQL/issues/66#issuecomment-507596085

Does this timeout?

I need to fix the timeout as well.

What I believe happens, is that the query takes more than 10sec, RediSQL timeout, and it replies to you with an empty response. However, eventually the query terminates, and it tried to reply, but the internal state is "corrupted" by the previous timeouted reply, so the thread crash. Only the thread crash so RediSQL keep running and you just see something that stop working.

Can you confirm this with the debug build? Just post the logs if you can reproduce it.

i test this on two scenarios. a table with 4000k data,execute the command "delete from a where id>2000000 and id<3000000", then my client get the error " read timeout". but the database can work, only "10s" timeout redis log:

[2019-07-02T07:35:44Z DEBUG redisql_lib::redis] Exec | Query = "delete from a where id>2000000 and id<3000000" [2019-07-02T07:35:44Z DEBUG redisql_lib::redis] do_execute | created statement [2019-07-02T07:35:44Z DEBUG redisql_lib::community_statement] Execute | Acquired db lock [2019-07-02T07:35:44Z DEBUG redisql_lib::community_statement] Execute | Read row modified before [2019-07-02T07:35:54Z DEBUG redisql_lib::sqlite] FromIterator => DONECursor [2019-07-02T07:35:54Z DEBUG redisql_lib::community_statement] Execute=> Executed trains of statements [2019-07-02T07:35:54Z DEBUG redisql_lib::community_statement] Execute=>DONECursor [2019-07-02T07:35:54Z DEBUG redisql_lib::redis] do_execute | statement executed [2019-07-02T07:35:54Z DEBUG redisql_lib::redis_type] Free thread safe context [2019-07-02T07:35:54Z DEBUG redisql_lib::redis] RedisContextSet | Drop [2019-07-02T07:35:54Z DEBUG redisql_lib::redis] with_contex_set | exit [2019-07-02T07:35:54Z DEBUG redisql_lib::redis] Exec | DONE, returning result [2019-07-02T07:35:54Z DEBUG redisql_lib::redis] Loop iteration

another : a table with 4000k data , when i execute the command "delete from b where id>1000000 and id<2000000", the streaming data always write into the table. my client get the error " timeout". but the database can not work.

the redis log: [2019-07-02T08:55:17Z DEBUG redisql_lib::redis] Exec | Query = "delete from b where id>=1000000 and id<2000000" [2019-07-02T08:55:17Z DEBUG redisql_lib::redis] do_execute | created statement [2019-07-02T08:55:17Z DEBUG redisql_lib::community_statement] Execute | Acquired db lock [2019-07-02T08:55:17Z DEBUG redisql_lib::community_statement] Execute | Read row modified before [2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | GotDB [2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | BlockedClient [2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | Create Command [2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | GotDB [2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | BlockedClient [2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | Create Command [2019-07-02T08:55:28Z DEBUG redisql_lib::sqlite] FromIterator => DONECursor [2019-07-02T08:55:28Z DEBUG redisql_lib::community_statement] Execute=> Executed trains of statements [2019-07-02T08:55:28Z DEBUG redisql_lib::community_statement] Execute=>DONECursor [2019-07-02T08:55:28Z DEBUG redisql_lib::redis] do_execute | statement executed [2019-07-02T08:55:28Z DEBUG redisql_lib::redis_type] Free thread safe context [2019-07-02T08:55:28Z DEBUG redisql_lib::redis] RedisContextSet | Drop [2019-07-02T08:55:28Z DEBUG redisql_lib::redis] with_contex_set | exit [2019-07-02T08:55:28Z DEBUG redisql_lib::redis] Exec | DONE, returning result [2019-07-02T08:55:28Z DEBUG redisql_lib::redis] Loop iteration

why? lock compete lead to?

siscia commented 5 years ago

Can you believe that I can't reproduce the issue?

I am able to send the database in timeout, but then, after the timeout, it keep working.

Can you please try to post reproducible examples?

Here one of my attempts:

127.0.0.1:6379> redisql.create_db DB
OK
127.0.0.1:6379> REDISQL.exec DB "create table t(id, data,b,c,d,e);"
1) DONE
2) (integer) 0
127.0.0.1:6379> REDISQL.exec DB "INSERT INTO t WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<14000000) SELECT  cnt.x, random() % 100000000, randomblob(21), randomblob(31), randomblob(26), random() % 1000 FROM c
nt;"
(nil)
(10.03s)
127.0.0.1:6379> REDISQL.exec DB "select 1;"
1) 1) (integer) 1
127.0.0.1:6379> REDISQL.exec DB "DELETE FROM t where e>200 and e<800"
1) DONE
2) (integer) 4194334
(3.21s)

It is possible that you are just not waiting long enough?

Do you think we should abort the query if it take more than the timeout?

xiao321 commented 5 years ago

when i execute the delete sql, the streaming data not stop to write. so i have two clients. one write , another query. i have set the maxWaitMillis in my code.

client a: GenericObjectPoolConfig genericObjectPoolConfig = new GenericObjectPoolConfig(); genericObjectPoolConfig.setMaxIdle(10); genericObjectPoolConfig.setMinIdle(10); genericObjectPoolConfig.setMaxWaitMillis(-1); //"-1" means no limit. JedisPool jedisPool = new JedisPool(genericObjectPoolConfig, ip, port); Jedis jedis = jedisPool.getResource(); int id = 1; while(true){ jedis.executeSQL("test", "insert into a(id,device_id,cert_id) values("+id+",0,0)"); id++; }

client b: GenericObjectPoolConfig genericObjectPoolConfig = new GenericObjectPoolConfig(); genericObjectPoolConfig.setMaxIdle(10); genericObjectPoolConfig.setMinIdle(10); genericObjectPoolConfig.setMaxWaitMillis(-1); JedisPool jedisPool = new JedisPool(genericObjectPoolConfig, ip, port); Jedis jedis = jedisPool.getResource(); List test1 = jedis.executeSQL("test", "delete from a where id>=2000000 and id<3000000");

baochengjie commented 5 years ago

My problem is also this. When querying, insert data at the same time, causing redis down. This problem is serious. @siscia

siscia commented 5 years ago

Alright, I haven't realized that the were 2 clients.

However, please keep in mind that the design of RediSQL is single-thread.

You have one thread that does the work, so you can't do two thing at the time, if one of those thing is slow, the command just queue up.

Eventually everything get done, but eventually.

Does this fit your observations?

siscia commented 5 years ago

If you need to delete stuff from your database, did you consider using multiple databases? Like one for each day?

baochengjie commented 5 years ago

My data is updated in real time, I also need to query. But it should not lead to redis crash! Waiting is ok, don't hope the redis not available

siscia commented 5 years ago

Ok, so the problem is that RediSQL crash.

And it is consistent with the wait, ok!

I will do as always, trying to reproduce the problem and then fix it.

If you have a log of the crash, that would be wonderful and it would speed up my work quite a bit.

baochengjie commented 5 years ago

Well, I will continue to test, I need to upgrade the version. It is estimated that the crash log is consistent with his problem log. I have split a database into multiple databases to achieve a data volume of 40K per database, but this problem still occurs.

baochengjie commented 5 years ago

Do you have any convenient contact tools? We can communicate easily.

siscia commented 5 years ago

I follow the gitter (slack like) that is on the readme.

And I have an email: simone@redbeardlab.com

The most helpful thing could be to add an integration test, like here: https://github.com/RedBeardLab/rediSQL/blob/master/test/correctness/test.py

baochengjie commented 5 years ago

After reading these tests, I found that the fields tested were not particularly large. The redisql field I am using is above 20, and the query can be complicated. The join table will be particularly slow at the million level, so I will write it to redisql after the join. For example: count(distinct user_id) or group by user_id (greater than 1w) So, causes redisql database to crash, or redis service crashes Both write and read simultaneously

siscia commented 5 years ago

Indeed the test where there mostly to make sure that a feature works as intended and that we don't introduce regressions.

It is definitely a good idea to introduce more complex tests, with more complex tables.

I will do it as soon as possible, but if you can produce a pull request for this it will really speed up my work. If the PR also causes a crash similar to the one you are seeing, that is even better but is not required.

If you need help in how to run the test, please let me know.

You just need to put redis-server on your PATH and point the variable REDIS_MODULE_PATH to the RediSQL library. Then execute python test.py. (I use python2 for this at the moment)

baochengjie commented 5 years ago
xiao321 commented 5 years ago

Alright, I haven't realized that the were 2 clients.

However, please keep in mind that the design of RediSQL is single-thread.

You have one thread that does the work, so you can't do two thing at the time, if one of those thing is slow, the command just queue up.

Eventually everything get done, but eventually.

Does this fit your observations?

no , after that the database of the reidsql can not work.

siscia commented 5 years ago

The version of redis is still 5.0.5?

xiao321 commented 5 years ago

The version of redis is still 5.0.5?

yes

siscia commented 5 years ago

Indeed I see some changes from the 5.0.3 to the 5.0.5

Let me work on it!

As always test cases will tremendously help and speed up my work!

xiao321 commented 5 years ago

i test the redis 5.0.3 + redisql 1.0.3. it is ok. And i will be watching for a while.

siscia commented 5 years ago

Did you guys found the same issue in 5.0.3?

From first tests, it seems like the problem during big join is that RediSQL use too much memory.

In 5.0.3 it start to use the swap and then it gets killed by the OOM reaper. On 5.0.5 it seems that Redis cannot allocate memory and it crash.

Do you guys see something that could be consistent with this hypothesis?

baochengjie commented 5 years ago

Changed to 5.0.3, this problem still occurs

baochengjie commented 5 years ago

Cause the database to be unavailable

siscia commented 5 years ago

Can this be an out of memory issue?

Is the database just locked? But the process is running or the process stop?

siscia commented 5 years ago

Guys, do you mind to try these two binaries?

https://github.com/RedBeardLab/rediSQL/files/3439238/rc-3.zip

xiao321 commented 5 years ago

our linux not support GLIBC_2.18 . need cargo build. do you have source code?

siscia commented 5 years ago

It is this branch over here:

https://github.com/RedBeardLab/rediSQL/pull/79

If you are a little patiente I can provide the binaries with a smaller glibc version as well.

baochengjie commented 5 years ago

https://github.com/RedBeardLab/rediSQL/files/3439238/rc-3.zip, After use, the memory grows rapidly, and it grows to 2 times in 1 hour. It has been growing rapidly and has been cut back to the original version. @siscia

siscia commented 5 years ago

Let me prepare the latest RC for you guys to test.

siscia commented 5 years ago

Can you guys share your AOF file? Or something close to it!

Beware that with that file I would be able to read the data.

With that file I can reproduce your environment and debug better the issue!

baochengjie commented 5 years ago

the file size too big. Previously compiled with the timeout-honorbranch. Try compiling with master now

siscia commented 5 years ago

What glibc are you using?