Closed YuMingLiao closed 5 months ago
I'm having trouble reproducing this. Is this an empty database from scratch? Here are my steps:
Terminal 1
cabal run tutd -- -d /tmp/freshdb
Terminal 2
cabal run tutd -- -d /tmp/freshdb
Terminal 1
TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master
Terminal 2
TutorialD (master/main): b:=true
TutorialD (master/main): :automergetohead union master
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐ │"a" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"true" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"b" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"false" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
└─────────────────────────────────────────────────┴──────────┘
Terminal 1
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐ │"false" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"true" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"a" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
└─────────────────────────────────────────────────┴──────────┘
TutorialD (master/main): :jumphead master
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐ │"false" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"b" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"true" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌───────────────┬──────────┐ │"a" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
└─────────────────────────────────────────────────┴──────────┘
I think the above behavior is what is expected.
Are you using the on-disk persistence strategy or something else?
I'll add some more context to the strategy error so that we can see the underlying cause of the error.
Please try to reproduce this issue with branch 364_automerge_bug which includes more error context and Trace.Debug
logging.
I think the above behavior is what is expected.
Could you try more than back and forth between Terminal 1 and 2 more than once? Sometimes it failed at Terminal 2's first time and other times it failed at Terminal 1's second time. And I haven't found what makes the uncertainty.
Are you using the on-disk persistence strategy or something else?
This time, I did delete a database. Still the same.
tutd1:
TutorialD (master/main): a := true
TutorialD (master/main): :automergetohead union master
tutd2:
TutorialD (master/main): b := true
TutorialD (<unknown>/main): :automergetohead union master
tutd1:
TutorialD (master/main): c := true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (Just (NoSuchTransactionError 939815ec-cd65-4726-9c98-ad9a7d877c4d)))
TutorialD (<unknown>/main):
and server:
(RelationVariable "true" (TransactionMarker 939815ec-cd65-4726-9c98-ad9a7d877c4d),RelationVariable "true" (TransactionMarker 939815ec-cd65-4726-9c98-ad9a7d877c4d))
939815...
is the root transaction for both of my tutds. It seems it failed at merging the default relation variable true
at roots.
Ah, ok- there may be race condition then. I'll automate something to reproduce the error.
Thanks for the excellent report, by the way. I don't think I would have found this otherwise.
Are you using tutd -n <db_name>
or tutd -d <db_path>
for the two clients?
I was tutd -n db -h 127.0.0.1 -p 6543
into a project-m36-server -n db -d db -h 127.0.0.1 -p 6543
, so -d in tutd is invalid.
I might need more time to investigate this. I was bumping curryer-rpc and streamly and fixing my haskell-incremental-building snack fork at the same time during reproducing my error. Now it produces a new error NoSuchSessionError. I shouldn't bring too many variables into one reproduction. Orz
Did you get the error after "back and forth" at least two times? I would like to know if it's just me.
I am also seeing the NoSuchSessionError
now. There is something very wrong with the over-the-wire protocol that the tests did not detect.
$ cabal run tutd -- -n db
Project:M36 TutorialD Interpreter 0.9.8
Type ":help" for more information.
A full tutorial is available at:
https://github.com/agentm/project-m36/blob/master/docs/tutd_tutorial.markdown
TutorialD (master/main): :showrelvars
┌─────────────────────────────────────────────────┬──────────┐
│attributes::relation {attribute::Text,type::Text}│name::Text│
├─────────────────────────────────────────────────┼──────────┤
│┌───────────────┬──────────┐ │"!�" │
││attribute::Text│type::Text│ │ │
│└───────────────┴──────────┘ │ │
│┌──────────┬─────────┐ │" " │
││�e ::Text│*�::Text│ │ │
│└──────────┴─────────┘ │ │
└─────────────────────────────────────────────────┴──────────┘
Confirmed, I can produce the bug in curryer- you can peg yourself to curryer-rpc==0.3.1 if you want to run a fixed version before I resolve the bug (and figure out how it got past the tests).
Ok, I have updated the master branch to peg streamly to 0.9.0 so the corruption issues are gone, but I am still able to reproduce the issue you reported, so I'm looking at that next.
Ok, I think I see what's going on. The underlying error which is getting masked by the merge error is NoSuchTransactionError
which should not happen. I'm not yet sure how this is possible...
The issue related to a piece of transaction graph trimming code that trimmed transaction related to the actual merge transactions when scanning for a common ancestor. Please try out the fix on 364_graph_merge_error_because_trimmed.
Confirmed! Each tutd automergetohead
twice without an issue.
I was trying to pop the hood but I don't understand about merge much. Thank you!
It still happens if I ask tutd2 to update same name relvar.
If my last test, every update use different relvar names. And it succeeded.
This time, I use a := true
back and forth between tutd1 and tutd2.
It fails at tutd2
tutd1
TutorialD (master/main): a := true
TutorialD (master/main): :automergetohead union master
tutd2
TutorialD (master/main): a := true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (NoSuchTransactionError 694e3f51-068a-4ef1-80f8-5b57e6620d65))
It looks like you are using the correct branch (based on the error), but I am not able to reproduce this issue. Are the instructions above running against a completely empty database? Are you able to reproduce this error after every attempt?
If we're both running from the same branch, the error above definitely hints at another similar bug.
Just to be clear, there are now two 364_* branches. The fix for this issue is on master
and 364_graph_merge_error_because_trimmed
. 364_automerge_bug
does not have the necessary fix.
Oops. forgot to report one condition. It happens after a merge happens, both of tutd quits, and then reconnect.
rm db -rf
project-m36-server -n db -d db
fst tutd: tutd -n db -h 127.0.0.1
2nd tutd: tutd -n db -h 127.0.0.1
fst tutd:
TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master
2nd tutd:
TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master
(which is fine at the moment.)
fst tutd: :quit
2nd tutd: :quit
(It seems I have to disconnect all of them first to reproduce the error.)
fst tutd: tutd -n db -h 127.0.0.1
2nd tutd: tutd -n db -h 127.0.0.1
fst tutd:
TutorialD (master/main): a:=true
TutorialD (master/main): :automergetohead union master
2nd tutd:
TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master
1fst tutd:
TutorialD (master/main): a:=true
TutorialD (<unknown>/main): :automergetohead union master
ERR: MergeTransactionError (StrategyViolatesRelationVariableMergeError (NoSuchTransactionError 200145cc-59b8-4a88-8a8e-0db0321ff4e7))
(It has to be automergetohead
three times back and forth. I guess that is becuase 1st time is fast-forward. 2nd time is a real merge. And 3rd time hits the error. And the missing transaction is still root transaction this time.)
And yes, I am at * (HEAD detached at upstream/364_graph_merge_error_because_trimmed)
.
And yes, it's a simlar bug.
And yes, it's reproducible to me.
It happens after both tutds quits a merged database. So not empty database.
The same part is missing root transaction. The difference is that it happens only after both tutds quit a merged database. Not one.
And If no merge happens at first-time connections, then the error is not reproducible.
So I guess I can start looking into what's in reconnection that makes the fix fail.
Both of tutds :showgraph
has root transaction. So it may not be about saving and reloading.
Ah, thanks for the additional details. I am now able to reproduce this.
I suspect the bug is in the same subGraphOfFirstCommonAncestor
function, though I don't yet understand how. I'll get this fixed ASAP.
subGraphOfFirstCommonAncestor_debug_msg
I use trace-function-call and stacked-dag to get some info about subGraphOfFirstCommonAncestor. I guess you might be interested.
The reproduction steps are the same. Although I don't have the info step by step. But as you can see, at some point, subgraph indeed lost some transactions including root. (I'll leave the question for my tomorrow...)
I am not familiar with the algorithm here yet. So I am not sure my error message is faithfully showing that, like
"why there is a merge at the start in argument graph
"
"why there is a no name root",
"why travservedSet is not empty in the beginning"
(traceFunction
in trace-function-call reports function arguments and return value in tail recursion way. So what I see first is the function inside the recursion most.),
"why the returned subgraph can be disconnected."
So might just take this error message as a reference.
Ok, it turned out I missed properly recursing transactions necessary to reconstruct the subgraph, so, after multiple merges, some transactions were not included in the subgraph. Oops! It should be fixed now. Please try again.
Confirmed. The error is not reproducible anymore. Thank you!
I would like to use two connections to represent two users in my app to update data. But automergeothead seems not what I expected. I expect every conn can automergetohead directly.
two tutd connections (1st tutd)
(2nd tutd)
Does that mean for every conn that has lost its head state, I need to jumphead master first then automergetohead?:jumphead
then:automergetohead
is a fast-forward commit.When it comes to merging a detached transaction and a head, according to docs merge transactions in project-m36, Union Merge will union two relvars if names and types are the same. But merging a := true and a := false gets the same error.
And I just found out that, in my case, union strategy might not be the right strategy. tuple insertion will always be saved while tuple deletion will always be ignored.
"restricted access sub-schema" might be another way to maintain consistency in multiple connections.