TabularEditor / BestPracticeRules

An official collection of standard Rules for use with Tabular Editor's Best Practice Analyzer.
120 stars 53 forks source link

Avoid division (use DIVIDE function instead) #20

Closed DaniilMaslyuk closed 4 years ago

DaniilMaslyuk commented 6 years ago

The current rule does not take into account the following situations where a slash is, in my opinion, legitimate:

  1. Comments (especially multi-line ones): My Measure := [$] /* This is just for illustration purposes */
  2. Slashes inside text strings: Net Profit := CALCULATE ( [$], 'Measure Hierarchy'[Name] = "Net gain/(loss)" )
  3. Slashes in field names (although issues #8 and #17 suggest that this should be avoided): Only Yes := CALCULATE ( [$], Dim[Yes/No] = "Yes" )
  4. Division by constants Monthly Revenue := Sales[Yearly Revenue] / 12

I managed to address the first situation by adding one more condition: and (not Expression.Contains("/*"))

Unfortunately, my Dynamic LINQ skills are close to nil, and I don't know how to address the other situations. Also, I understand that my solution to the first situation is not ideal because it excludes cases where there is both a multi-line comment and a genuine division that should be avoided.

otykier commented 6 years ago

Thanks for posting. While Tabular Editor is actually doing some simple parsing of the DAX expressions, that allows it to internally distinguish between slashes used for division and slashes within comments / object names / strings, this information is currently not available to the Best Practice Analyzer. So the best we can do for now, is to search the Expression string.

A naïve search like what the current rule does, is bound to return false positives as per your cases above - it could be improved slightly by using RegEx instead. However, I have a backlog item to expose the parsed DAX expression to the Best Practice Analyzer, which would allow us to properly detect the use of slash for division (and lots of other interesting stuff!)

otykier commented 5 years ago

Just an update on this. Next version of Tabular Editor is going to feature a new method, that can be accessed through Advanced Scripting as well as through the Best Practice Analyzer: Tokenize(). This method applies to any object that holds a DAX expression, and returns a list of all tokens that make up the DAX expression.

For example, consider a measure that has the following nonsensical DAX expression:

// Calculates reseller / internet total
[Reseller/Internet Total] / 
CALCULATE([Total Sales], 'Product'[Color] = "Green/Blue")

Selecting this measure in the tree and running the following script:

Selected.Measure.Tokenize().Where(t => !t.CommentOrWhitespace).Output();

Will produce this output: image

Notice how there's just a single DaxToken of type "DIV", even though the DAX expression contains multiple slashes. This allows us to create much better Best Practice Rules to inspect DAX expressions. For example, the following rule will report all objects having an expression that contains a DIV token, like the measure above:

Tokenize().Any(Type = DIV)

This takes care of situations 1-3 as you pointed out.

DaxToken objects have a "Prev" and "Next" property, which allows us to access the previous or next non-whitespace/comment token in the expression, allowing us to handle situation 4 with a rule that looks like this:

Tokenize().Any(
    Type = DIV and
    Next.Type <> INTEGER_LITERAL and
    Next.Type <> REAL_LITERAL
)

I realise this is pretty advanced stuff that many Tabular developers probably won't care about, which is why I'm also making an update to the Best Practice Analyzer UI, to make it easy to import Best Practice Rules from this GitHub repo with a single click.

If you have any other ideas of Best Practice Rules that could benefit from using the new Tokenize() method, please let me know!

DaniilMaslyuk commented 5 years ago

This is excellent -- thanks, Daniel!

otykier commented 5 years ago

The rule file has now been updated with the release of Tabular Editor 2.8.1 supporting the use of the new Tokenize() method: https://github.com/TabularEditor/BestPracticeRules/blob/master/src/standard/DAX%20Expressions/DAX_DIVISION_COLUMNS.json

Daandamhuis commented 5 years ago

Mmm I do get some false positives with the lastest revision of the DIV and Comments. See this screenshot for example

afbeelding

otykier commented 5 years ago

The BPARules-standard.json file was only just updated now (my automated build pipeline was disabled for some reason). Are you sure it's using the updated version of the rule expression? If so, could you try to copy the full DAX code of this measure into this thread? Thanks

Daandamhuis commented 5 years ago

Sure! :)

`SUMX ( SUMMARIZE ( Customer, Customer[Customer No], "DaysSales", IF ( MIN ( 'Customer'[Total Days with Sales] ) >= 21, 21, MIN ( 'Customer'[Total Days with Sales] ) ) ), DIVIDE ( CALCULATE ( [Quantity Sold], FILTER ( ALL ( 'Posting Date' ), 'Posting Date'[RankByDay]

MAX ( 'Posting Date'[RankByDay] ) - [DaysSales] && 'Posting Date'[RankByDay] <= MAX ( 'Posting Date'[RankByDay] ) ) ), [DaysSales], BLANK () ) * 7 )`

BVMather commented 5 years ago

I'm also seeing false positives on dates "12/02/2019", and comments / abcd / although not sure whether that is by design (i.e should we ignore the item).

otykier commented 5 years ago

Darn! Turns out that there's a bug in 2.8.1, where the token labels (DIV, MINUS, FILTER, etc...) that you can use in the Best Practice Analyzer do not line up with the actual tokens returned from the Tokenize() function. As a temporary workaround, try to change the rule into:

Tokenize().Any(
    Type = 285
)

I'll provide an update to Tabular Editor 2.8.1 later today, that fixes this issue.

otykier commented 4 years ago

Fixed (some time ago).