apache / incubator-seata

:fire: Seata is an easy-to-use, high-performance, open source distributed transaction solution.
https://seata.apache.org/
Apache License 2.0
25.08k stars 8.73k forks source link

bugfix: fix readonly branch commit errors in Oracle XA transactions. #6629

Open gongycn opened 1 week ago

gongycn commented 1 week ago

Ⅰ. Describe what this PR did

fix the issue of readonly branch commit errors in Oracle XA transactions, which leads to long-running transactions being unable to commit and causing unreadable states. Solution Logic: When the database is Oracle and the prepare result is XA_RDONLY, notify TC that the current branch status is BranchStatus.PhaseOne_RDONLY (a newly added state to address this issue). During global commit, for branches with this status type, delete the branch log directly and ignore it.

Ⅱ. Does this pull request fix one issue?

fixes #6512

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Based on the problematic local project instance, the verification includes:

Whether the Seata TC frontend displays normally; In Seata TM managing multiple RMs, whether it functions correctly when there is one or more readonly branches; whether it functions correctly when there are no readonly branches; and whether it functions correctly when there are only one or more readonly branches.

Ⅴ. Special notes for reviews

gongycn commented 1 week ago

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine.

Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

funky-eyes commented 1 week ago

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine.

Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

We currently refuse to use stored procedures directly, so I would prefer to place the XA START after the first DML action is recognized.

gongycn commented 1 week ago

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine. Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

We currently refuse to use stored procedures directly, so I would prefer to place the XA START after the first DML action is recognized.

  1. The official documentation mentions for XA use cases: Suitable for migrating old applications to the Seata platform based on the XA protocol.. Therefore, if stored procedures are not supported, it will not be possible to achieve this. Currently, some of our applications are using a mix of XA and TCC.
  2. If the XA transaction is started only after the first DML operation, in Oracle, this can be achieved by leveraging the Promotable XA feature to ensure that the repeatable read characteristic is maintained under the Repeatable Read isolation level. Otherwise, the repeatable read cannot be guaranteed. This feature is not necessarily supported in other databases.
funky-eyes commented 1 week ago
  1. I don't understand the relevance between supporting stored procedures and migrating to Seata XA mode that you mentioned.
  2. I believe Oracle commonly uses the RC isolation level, so it seems there's no need to use the feature you mentioned.
gongycn commented 1 week ago
  1. I don't understand the relevance between supporting stored procedures and migrating to Seata XA mode that you mentioned.
  2. I believe Oracle commonly uses the RC isolation level, so it seems there's no need to use the feature you mentioned.
  1. Currently, we have some legacy services that process business logic through stored procedures and we want to integrate them with Seata transactions.
  2. We are using the RR (Repeatable Read) isolation level, and since Seata is a general framework, XA as a two-phase commit protocol should support various scenarios.
  3. The basic logical template structure of XA transactions is consistent, with only minor implementation differences across different databases. However, all of them can handle the basic logical template well. For the current issue, the desired result can be achieved during the prepare phase, and appropriate processing can be performed. There is no need to introduce other processing logic outside of the XA process.

Thanks for your question, which has led me to further analyze this solution in depth.

gongycn commented 1 week ago
  1. The current issue of read-only transaction branch commit errors can be efficiently and elegantly resolved within the internal handling logic of XA transactions.
  2. Since SQL parsing cannot adapt to all database syntaxes, introducing SQL parsing to solve this issue would undoubtedly reduce the generality of XA transactions and increase performance overhead due to SQL parsing.

Based on the above points, it is not recommended to introduce SQL parsing to address this issue.

funky-eyes commented 1 week ago

I understand your point, but I believe that in a read-only transaction, why isn't it considered a local transaction instead of being included in a global transaction? If a global transaction includes a read-only local transaction, it will inevitably affect throughput due to this additional branch. However, if it is just a local transaction, there will be significantly fewer interactions with the DB and the TC. Therefore, I believe that the overhead caused by SQL parsing is much lower than the overhead of starting an XA transaction and registering an XA branch to the TC. Moreover, we only need to identify whether the SQL is a SELECT statement rather than other DML statements. This has little to do with the number or variety of SQL types supported by SQL parsing.

gongycn commented 1 week ago

I understand your point, but I believe that in a read-only transaction, why isn't it considered a local transaction instead of being included in a global transaction? If a global transaction includes a read-only local transaction, it will inevitably affect throughput due to this additional branch. However, if it is just a local transaction, there will be significantly fewer interactions with the DB and the TC. Therefore, I believe that the overhead caused by SQL parsing is much lower than the overhead of starting an XA transaction and registering an XA branch to the TC. Moreover, we only need to identify whether the SQL is a SELECT statement rather than other DML statements. This has little to do with the number or variety of SQL types supported by SQL parsing.

The branch transactions in this scenario are typically executed based on the business state, which determines whether data insertions, deletions, or updates are necessary. Different business data states lead to inconsistencies in this processing logic—sometimes it may be read-only, sometimes not.

While it can be seen as an imperfect organization and implementation of business logic, due to resource constraints, it is often difficult to handle every detail thoroughly.

The solution I am currently proposing addresses this issue by adding an additional interaction with the Transaction Coordinator (TC) when such a situation arises, to avoid errors in the commit of read-only branch transactions. This can be understood as a fault tolerance mechanism. Of course, in a well-organized and clear business logic process, as you mentioned, such situations would not occur, and thus this extra interaction with the TC would not be necessary.

gongycn commented 1 week ago

Of course, the proposed solution can later be combined with the SQL parsing logic you mentioned. If the SQL parsing perfectly handles all operations that are read-only and avoids initiating XA-related transaction actions, it would also prevent this extra interaction with the TC. However, if there are any omissions, this approach can improve robustness.

Furthermore, this proposed solution can effectively and immediately resolve the current issue with Oracle XA branch transactions. It can prevent the problem where, after the global transaction is committed, other XA branches' data remain unreadable for an extended period due to the failure of committing read-only transaction branches.

funky-eyes commented 1 week ago
  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.

  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.

  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

gongycn commented 1 week ago
  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.
  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.
  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

The issue in point 2 is that when an XA transaction commits globally, the multiple branch transactions in the TC are committed synchronously, one after the other. If any of these commits fail, the commit logic exits, causing the global commit to get stuck on that failed branch, continuously retrying. This prevents subsequent RM branches from actually receiving the TC's commit request, making the transactions of these RM branches invisible to other transactions.

funky-eyes commented 1 week ago
  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.
  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.
  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

The issue in point 2 is that when an XA transaction commits globally, the multiple branch transactions in the TC are committed synchronously, one after the other. If any of these commits fail, the commit logic exits, causing the global commit to get stuck on that failed branch, continuously retrying. This prevents subsequent RM branches from actually receiving the TC's commit request, making the transactions of these RM branches invisible to other transactions.

I understand. So that's why you've implemented the skip operation for this branch. I see this PR as a temporary solution, but ultimately, we should move towards a unified processing model

gongycn commented 1 week ago

Of course, the proposed solution can later be combined with the SQL parsing logic you mentioned. If the SQL parsing perfectly handles all operations that are read-only and avoids initiating XA-related transaction actions, it would also prevent this extra interaction with the TC. However, if there are any omissions, this approach can improve robustness.

I understand. So that's why you've implemented the skip operation for this branch. I see this PR as a temporary solution, but ultimately, we should move towards a unified processing model

Yes, this PR can be seen as a temporary solution, or as mentioned above, it can serve as a fallback solution to ensure robustness.

gongycn commented 1 week ago

One of the reasons for using XA transactions in Seata is that XA transactions allow developers to:

  1. Fully utilize the database's unique features: For example, special SQL syntax, stored procedures, functions, etc.
  2. Standardized interface: XA transactions are implemented based on a standard protocol with a first phase (start, execute, end, prepare) and a second phase (commit/rollback).

I believe that the implementation of XA transactions in Seata should maintain these basic characteristics of XA transactions. From this perspective, the approach in this PR:

  1. Does not compromise the support for the database's unique features.
  2. Does not exceed the requirements of the XA standard protocol.
  3. Indeed solves the issue of read-only branch commit errors.

Therefore, I hope this PR can be accepted.

gongycn commented 1 week ago

@funky-eyes fix the incorrect code style: '{' is not preceded with whitespace.