agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

StrategyViolatesRelationVariableMergeError when automergetohead a not-head conn. #364

Closed YuMingLiao closed 5 months ago

YuMingLiao commented 5 months ago

I would like to use two connections to represent two users in my app to update data. But automergeothead seems not what I expected. I expect every conn can automergetohead directly.

two tutd connections (1st tutd)

TutorialD (master/main):  a := true
TutorialD (master/main): :automergetohead union master

(2nd tutd)

TutorialD (master/main): 
(enter after 1s tutd automergetohead. Not head anymore.)
TutorialD (<unknown>/main):
TutorialD (<unknown>/main): b := true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError StrategyViolatesRelationVariableMergeError
TutorialD (<unknown>/main): :jumphead master
The current transaction has uncommitted changes. If you continue, the changes will be lost. Continue? (Y/n): Y
TutorialD (master/main): b := true
TutorialD (master/main): :automergetohead union master

Does that mean for every conn that has lost its head state, I need to jumphead master first then automergetohead?

:jumphead then :automergetohead is a fast-forward commit.

When it comes to merging a detached transaction and a head, according to docs merge transactions in project-m36, Union Merge will union two relvars if names and types are the same. But merging a := true and a := false gets the same error.

And I just found out that, in my case, union strategy might not be the right strategy. tuple insertion will always be saved while tuple deletion will always be ignored.

To avoid TransactionIsNotAHead errors, branch the transaction graph and merge it to the head when ready to commit.

"restricted access sub-schema" might be another way to maintain consistency in multiple connections.

agentm commented 5 months ago

I'm having trouble reproducing this. Is this an empty database from scratch? Here are my steps:

Terminal 1
cabal run tutd -- -d /tmp/freshdb

Terminal 2
cabal run tutd -- -d /tmp/freshdb

Terminal 1
TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master

Terminal 2
TutorialD (master/main): b:=true
TutorialD (master/main): :automergetohead union master
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐                     │"a"       │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"true"    │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"b"       │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"false"   │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
└─────────────────────────────────────────────────┴──────────┘

Terminal 1
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐                     │"false"   │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"true"    │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"a"       │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
└─────────────────────────────────────────────────┴──────────┘
TutorialD (master/main): :jumphead master
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐                     │"false"   │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"b"       │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"true"    │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌───────────────┬──────────┐                     │"a"       │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
└─────────────────────────────────────────────────┴──────────┘

I think the above behavior is what is expected.

Are you using the on-disk persistence strategy or something else?

agentm commented 5 months ago

I'll add some more context to the strategy error so that we can see the underlying cause of the error.

agentm commented 5 months ago

Please try to reproduce this issue with branch 364_automerge_bug which includes more error context and Trace.Debug logging.

YuMingLiao commented 5 months ago

I think the above behavior is what is expected.

Could you try more than back and forth between Terminal 1 and 2 more than once? Sometimes it failed at Terminal 2's first time and other times it failed at Terminal 1's second time. And I haven't found what makes the uncertainty.

Are you using the on-disk persistence strategy or something else?

This time, I did delete a database. Still the same.

tutd1:

TutorialD (master/main): a := true
TutorialD (master/main): :automergetohead union master

tutd2:

TutorialD (master/main): b := true
TutorialD (<unknown>/main): :automergetohead union master

tutd1:

TutorialD (master/main): c := true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (Just (NoSuchTransactionError 939815ec-cd65-4726-9c98-ad9a7d877c4d)))
TutorialD (<unknown>/main): 

and server:

(RelationVariable "true" (TransactionMarker 939815ec-cd65-4726-9c98-ad9a7d877c4d),RelationVariable "true" (TransactionMarker 939815ec-cd65-4726-9c98-ad9a7d877c4d))

939815... is the root transaction for both of my tutds. It seems it failed at merging the default relation variable true at roots.

agentm commented 5 months ago

Ah, ok- there may be race condition then. I'll automate something to reproduce the error.

Thanks for the excellent report, by the way. I don't think I would have found this otherwise.

agentm commented 5 months ago

Are you using tutd -n <db_name> or tutd -d <db_path> for the two clients?

YuMingLiao commented 5 months ago

I was tutd -n db -h 127.0.0.1 -p 6543 into a project-m36-server -n db -d db -h 127.0.0.1 -p 6543, so -d in tutd is invalid.

I might need more time to investigate this. I was bumping curryer-rpc and streamly and fixing my haskell-incremental-building snack fork at the same time during reproducing my error. Now it produces a new error NoSuchSessionError. I shouldn't bring too many variables into one reproduction. Orz

Did you get the error after "back and forth" at least two times? I would like to know if it's just me.

agentm commented 5 months ago

I am also seeing the NoSuchSessionError now. There is something very wrong with the over-the-wire protocol that the tests did not detect.

$ cabal run tutd -- -n db
Project:M36 TutorialD Interpreter 0.9.8
Type ":help" for more information.
A full tutorial is available at:
https://github.com/agentm/project-m36/blob/master/docs/tutd_tutorial.markdown
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐                     │"!�"      │
││attribute::Text│type::Text│                     │          │
│└───────────────┴──────────┘                     │          │
│┌──────────┬─────────┐                           │"    "       │
││�e   ::Text│*�::Text│                           │          │
│└──────────┴─────────┘                           │          │
└─────────────────────────────────────────────────┴──────────┘
agentm commented 5 months ago

Confirmed, I can produce the bug in curryer- you can peg yourself to curryer-rpc==0.3.1 if you want to run a fixed version before I resolve the bug (and figure out how it got past the tests).

agentm commented 5 months ago

Ok, I have updated the master branch to peg streamly to 0.9.0 so the corruption issues are gone, but I am still able to reproduce the issue you reported, so I'm looking at that next.

agentm commented 5 months ago

Ok, I think I see what's going on. The underlying error which is getting masked by the merge error is NoSuchTransactionError which should not happen. I'm not yet sure how this is possible...

agentm commented 5 months ago

The issue related to a piece of transaction graph trimming code that trimmed transaction related to the actual merge transactions when scanning for a common ancestor. Please try out the fix on 364_graph_merge_error_because_trimmed.

YuMingLiao commented 5 months ago

Confirmed! Each tutd automergetohead twice without an issue.

I was trying to pop the hood but I don't understand about merge much. Thank you!

YuMingLiao commented 5 months ago

It still happens if I ask tutd2 to update same name relvar. If my last test, every update use different relvar names. And it succeeded. This time, I use a := true back and forth between tutd1 and tutd2. It fails at tutd2 automergetohead union master.

tutd1

TutorialD (master/main): a := true
TutorialD (master/main): :automergetohead union master

tutd2

TutorialD (master/main): a := true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (NoSuchTransactionError 694e3f51-068a-4ef1-80f8-5b57e6620d65))
agentm commented 5 months ago

It looks like you are using the correct branch (based on the error), but I am not able to reproduce this issue. Are the instructions above running against a completely empty database? Are you able to reproduce this error after every attempt?

If we're both running from the same branch, the error above definitely hints at another similar bug.

agentm commented 5 months ago

Just to be clear, there are now two 364_* branches. The fix for this issue is on master and 364_graph_merge_error_because_trimmed. 364_automerge_bug does not have the necessary fix.

YuMingLiao commented 5 months ago

Oops. forgot to report one condition. It happens after a merge happens, both of tutd quits, and then reconnect.

rm db -rf project-m36-server -n db -d db fst tutd: tutd -n db -h 127.0.0.1 2nd tutd: tutd -n db -h 127.0.0.1 fst tutd:

TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master

2nd tutd:

TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master

(which is fine at the moment.)

fst tutd: :quit 2nd tutd: :quit

(It seems I have to disconnect all of them first to reproduce the error.)

fst tutd: tutd -n db -h 127.0.0.1 2nd tutd: tutd -n db -h 127.0.0.1

fst tutd:

TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master

2nd tutd:

TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master

1fst tutd:

TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (NoSuchTransactionError 200145cc-59b8-4a88-8a8e-0db0321ff4e7))

(It has to be automergetohead three times back and forth. I guess that is becuase 1st time is fast-forward. 2nd time is a real merge. And 3rd time hits the error. And the missing transaction is still root transaction this time.)

And yes, I am at * (HEAD detached at upstream/364_graph_merge_error_because_trimmed). And yes, it's a simlar bug. And yes, it's reproducible to me. It happens after both tutds quits a merged database. So not empty database.

The same part is missing root transaction. The difference is that it happens only after both tutds quit a merged database. Not one.

And If no merge happens at first-time connections, then the error is not reproducible.

So I guess I can start looking into what's in reconnection that makes the fix fail.

Both of tutds :showgraph has root transaction. So it may not be about saving and reloading.

agentm commented 5 months ago

Ah, thanks for the additional details. I am now able to reproduce this.

I suspect the bug is in the same subGraphOfFirstCommonAncestor function, though I don't yet understand how. I'll get this fixed ASAP.

YuMingLiao commented 5 months ago

subGraphOfFirstCommonAncestor_debug_msg

I use trace-function-call and stacked-dag to get some info about subGraphOfFirstCommonAncestor. I guess you might be interested.

The reproduction steps are the same. Although I don't have the info step by step. But as you can see, at some point, subgraph indeed lost some transactions including root. (I'll leave the question for my tomorrow...)

I am not familiar with the algorithm here yet. So I am not sure my error message is faithfully showing that, like

"why there is a merge at the start in argument graph"

"why there is a no name root",

"why travservedSet is not empty in the beginning" (traceFunction in trace-function-call reports function arguments and return value in tail recursion way. So what I see first is the function inside the recursion most.),

"why the returned subgraph can be disconnected."

So might just take this error message as a reference.

agentm commented 5 months ago

Ok, it turned out I missed properly recursing transactions necessary to reconstruct the subgraph, so, after multiple merges, some transactions were not included in the subgraph. Oops! It should be fixed now. Please try again.

YuMingLiao commented 5 months ago

Confirmed. The error is not reproducible anymore. Thank you!