br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

System shutdown does not respect our transactions, causing uncommitted data to go to disk #600

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
Whenever the user stops managix, a sharp checkpoint is triggered by the 
recovery manager before stopping the system. This in turn trigger blind flashes 
of the memory components of of all active indexes, causing uncommitted data to 
go to disk. It is also causing the system to be in a hung state and the 
instance is never stopped unless the NC process is killed as reported in issue 
as reported in issue 590.

The shutdown should respect our transactions. One easy fix is to ignore the 
in-memory components and recover the committed data the next time the instance 
is started. 

Original issue reported on code.google.com by salsuba...@gmail.com on 4 Aug 2013 at 4:51

GoogleCodeExporter commented 8 years ago
If this issue is not blocking progress for anyone, I think we should take the 
time to implement the right fix. In my mind, the right fix is:

1) block incoming requests (tell the REST API to stop serving requests / send 
"shutting down" responses)
2) wait for pending transactions to complete (we may be able to just wait for 
all jobs in the system to finish, with special attention paid to feeds)
3) flush indexes

Doesn't feel right delegating the work to recovery, particularly because we can 
do better + recovery is untested.

Original comment by zheilb...@gmail.com on 4 Aug 2013 at 5:23

GoogleCodeExporter commented 8 years ago
I agree, this should be fixed the right away.
When testing the k-buffering, I'm seeing this issue almost every time I stop 
the instance when the feed is active. I would say it is a blocking issue.

Original comment by salsuba...@gmail.com on 4 Aug 2013 at 5:27

GoogleCodeExporter commented 8 years ago
You're seeing uncommitted data or you're seeing issue 590?

Original comment by zheilb...@gmail.com on 4 Aug 2013 at 5:34

GoogleCodeExporter commented 8 years ago
The system is in a deadlock state.

Original comment by salsuba...@gmail.com on 4 Aug 2013 at 5:36

GoogleCodeExporter commented 8 years ago
I take it back. The checkpoint that is taken before the shutdown is using the 
right code path, and it is not flushing uncommitted data to disk. Since we 
don't block incoming requests upon shutdown, any transaction that commits 
afterwards (and its logs made it to disk) will be recovered next time the 
system is started.

Original comment by salsuba...@gmail.com on 4 Aug 2013 at 2:45

GoogleCodeExporter commented 8 years ago
I take it back one more time, this issue is still valid. Uncommitted data is 
going to disk upon system shutdown. 
I confused myself this AM.

Original comment by salsuba...@gmail.com on 4 Aug 2013 at 7:33

GoogleCodeExporter commented 8 years ago
fixed in the following revision.
https://code.google.com/p/asterixdb/source/detail?r=46f0815add2bca05a72579271b07
757616d94071

Original comment by kiss...@gmail.com on 23 Aug 2013 at 10:15