camunda / camunda-bpm-platform

Flexible framework for workflow and decision automation with BPMN and DMN. Integration with Quarkus, Spring, Spring Boot, CDI.
https://camunda.com/
Apache License 2.0
4.11k stars 1.55k forks source link

Different behaviours of case sensitivity query for variable name between runtime and history #3006

Open tonny1983 opened 1 year ago

tonny1983 commented 1 year ago

Environment (Required on creation)

Steps to reproduce (Required on creation)

Just call those two APIs with the same variableNameLike value

Observed Behavior (Required on creation)

API Case Sensitive DB Case Insensitive DB Conclusion
Runtime API case-sensitive case-insensitive depends on DB configuration
Historic API with variableNamesIgnoreCase case-insensitive case-insensitive depends on the query condition
Historic API w/o variableNamesIgnoreCase case-sensitive case-sensitive depends on the query condition

Expected behavior (Required on creation)

API Case Sensitive DB Case Insensitive DB Conclusion
Runtime API case-sensitive case-insensitive depends on DB configuration
Historic API with variableNamesIgnoreCase case-insensitive case-insensitive depends on the query condition
Historic API w/o variableNamesIgnoreCase case-sensitive case-insensitive depends on DB configuration

Root Cause (Required on prioritization)

For MS SQL Server the historic API is configured to use a special collation that makes the DB case-sensitive for this query only. See https://github.com/camunda/camunda-bpm-platform/blob/f6f49080bdc28eb5d5ce903c0cab7ebba058eaaa/engine/src/main/resources/org/camunda/bpm/engine/impl/mapping/entity/HistoricVariableInstance.xml#L354-L378

The collation is not only applied when the property is used but in any case.

Solution Ideas

Only apply the collation (used as a placeholder via ${collationForCaseSensitivity}) for both properties when the respective property is set to true.

mboskamp commented 1 year ago

Hi @tonny1983, Thanks for opening an issue. Let me start by saying that this is not a bug. There are two different APIs here that have their own documentation and behavior. I don't see any evidence that either of them behaves in any unexpected way. Let me go through the points you make in the issue.

Why the collation placeholder? Databases are different. In fact, MS SQL server is the only DB we are supporting that is case insensitive. This is also the reason why there is a ${collationForCaseSensitivity} placeholder in the API that can be configured to ignore the case. If you have a look at this class you will see that the ${collationForCaseSensitivity} is only used to make SQL server compatible with this feature.

Why don't we let the DB decide? User can configure their DB in many ways that are not in the control of Camunda. We have to make some assumptions to make sure a feature works on all platforms. This is why we needed to tame MS SQL server here to make the feature work cross-platform.

Why do the two APIs work differently? This feature was initially introduced for filtering Task names. It was later rolled out for other queries but not all. We would need to adjust every query API individually and did not receive enough evidence that users were missing the case-insensitive search in other queries.

What now? We will not modify the HistoricVariableInstance because it works as intended and we can not roll back API for backward compatibility reasons. For consistency, we might introduce the variableNamesIgnoreCase flag in the VariableInstance API. However, it is very unlikely that someone from the team will work on this soon. If you are interested to introduce it and make the two APIs more consistent we would happily accept a contribution.

Some open questions from my side:

Next steps: I will close the issue for now. We can discuss the matter further here. For now, there is nothing we will do from our side here.

Let me know what you think. Best, Miklas

tonny1983 commented 1 year ago

Hi @mboskamp ,

Thanksvery much for your kind reply.

Let me answer the problem of the document first, because I'm afraid it is so simple. For the description of variableNamesIgnoreCase field of the POST /variable-instance API in current doc, it is said that

Match all variable names provided in variableValues case-insensitively. If set to true variableName and variablename are treated as equal.

Obviously, the variableValues there should be variableNames.

Next, let's talk about the differeces about the two APIs. Actually, the variableNamesIgnoreCase fields works properly just as the same as what the document said. However, you mentioned that

MS SQL server is the only DB we are supporting that is case insensitive

which is confusing me. Would you mind explaining it more detail please?

Above all, I think I can make a summary about the case sensitivity for name-like query as the following table:

API Case Sensitive DB Case Insensitive DB Conclusion
Runtime API C/S C/I depends on DB
Historic API with variableNamesIgnoreCase C/I C/I depends on the query condition
Historic API w/o variableNamesIgnoreCase C/S C/I depends DB

Would you please also verify it?

Thank you for your time again.

Best, Tonny

mboskamp commented 1 year ago

Hey @tonny1983, Sorry for the wait. I think I understand now what the issue is. The table was very helpful to see the differences.

What is the problem? The historic variable API POST /history/variable-instance supports a properties variableNamesIgnoreCase and variableValuesIgnoreCase. MS SQL Server by default treats strings case-insensitively so when setting one of the properties to false variables are still treated case-insensitively (which is expected, because the DB is configured that way). When setting one of the properties to true the query is configured to be case-sensitive (by using collation) and returns a different result set.

What is the solution? The historic API should only configure the statement to be case-sensitive when one of the properties variableNamesIgnoreCase or variableValuesIgnoreCase is set to true. See code. If not, then the collation should not be used and it should be up to the DB to decide.

What is not part of the solution? The runtime API is not affected by this bug. As explained, the ignoreCase properties are part of a feature that was not rolled out to the runtime API. That means it behaves as expected. We can also not roll back the ignoreCase properties from the historic API for compatibility reasons.

What will we do now? I will re-open the ticket and adjust the description. It will then go into our backlog. It might be prioritized by product management but that is not a given since our backlog is very big. If you are interested in fixing the issue in the historic API yourself, we would appreciate a contribution via PR. In that case, I would be available for giving input on solution ideas and code review. Please let me know if you like to work on this.

Cheers, Miklas

tonny1983 commented 1 year ago

Hello @mboskamp , Thanks a lot for your review.

May I confirm that a reasonable solution should be:

  1. make case sensitivity of runtime API keep status quo, because it works as expected;
  2. for historic API, setting the variableNamesIgnoreCase property to be true should override the case sensitivity of DB configuration to make the query be case insensitive, and if the property is false, the case sensitivity should depend on the DB configuration.

If those are correct, with all due respect, I'm afraid there is something wrong about the problem you described.

The problem is not when the variableNamesIgnoreCase property setting to be true but when it is false or missing (default as false). As the code shows, the ${collationForCaseSensitivity} placeholder will be added to the end of the query statement regardless of the value of the property. It works fine for true case because two sides of the equal check have already be upper case and will not be affected by the collation. On the other side, for false case, the statement will be forced to be case sensitive due the collation.

Thus, I'm afraid the correct solution is just to delete the ${collationForCaseSensitivity} placeholder.

Another reason I continue to question the existence of the placeholder is its value. For SQL Server, the value is Latin1_General_CS_ASwhich could invalidate other collation options of SQL Server (such as accent-sensitive_AS, kana-sensitive_KS and so on). The placeholder would limitate the query for other collation such as Japanese.

To fix the code of the problem may be simple. However, I'm afraid it would affect some features of Camunda Production. If these doubts and limitations clarify, I'm glad to create a PR for it.

Regards, Tonny

mboskamp commented 1 year ago

Hi @tonny1983,

Thanks again for clarifying. I don't recall the reason for the collation from the top of my head but I believe it was implemented like this with intent. I agree though that the behavior we are seeing is not expected. Right now, it is not clear to me what the best solution would be.

I will create a test PR to run your proposal of removing the collation entirely against our CI. I'll let you know what I find out.

Miklas

mboskamp commented 1 year ago

Hi @tonny1983, sorry for the long wait. I did not have the time to investigate further. I'll let you know once I had time to look into the CI results.

mboskamp commented 1 year ago

Hi @tonny1983,

I am sorry to inform you that we currently lack the capacity to bring this issue to a close. I will close the related PR. If you like to work on this bug yourself, we would very much appreciate a pull request. The issue will remain open, but I will remove it from our backlog. Like this, anybody may choose to work on it in the future.

Best, Miklas