Can we detect "random" ids to avoid them?

amenk commented 2 years ago

I tried the extension with a Magento 2 checkout flow (Luma based theme).

This is parts of what I got

  # in the cart

  // Click on <input> #cart-51378-qty
  cy.get('#cart-51378-qty').click()

  // Click on <input> #cart-51378-qty
  cy.get('#cart-51378-qty').click()

  # in the checkout

  // Click on <input> #ECIHNWG
  cy.get('#ECIHNWG').click()

  // Click on <input> #SERNC0C
  cy.get('#SERNC0C').click()

This shows that using IDs is not always a good idea.

In the cart-qty input, there is the cart-id which would change on each run and thus is useless.

On the checkout page where I fill the name, address and so on, the inputs are generated by knockout JS and have always random IDs. There it would be better to use the input name selectors.

I know that especially the second case is hard to detect.

If the string "looks" very random, it might be autogenerated and not a meaningful.

Maybe such a an algorithm could be used https://github.com/adobe/stringlifier

amenk commented 2 years ago

I tried https://github.com/Webfit-project/random_string_detection but did not get very promissing results...

$ node index.js 
name => false
on detecte
email => 0.25
on detecte
cart-qty => 0.42857142857142855
on detecte
cart-1-qty => 0.3333333333333333
on detecte
SERNC0C => 0.3333333333333333
on detecte
cart-51378-qty => 0.5384615384615384

MikeShi42 commented 2 years ago

@amenk totally missed this issue, apologies - yes I've looked into this a bit in the past, unfortunately random-ness or cardinality is actually really really tricky to define for arbitrary strings, even thought it seems quite simple (after all, isn't it just cardinality?). I think the better approach for this long term is just to allow the user to edit selectors post-action, but it's likely a heavier lift there to allow for that.

Otherwise I think I've seen approaches where we try to statistically calculate the likelihood of 2 or 3 character pairs occurring next to each other relative to an english dictionary for example, but that also sounded incredibly complicated to implement. Other ideas may be to try to pick some decent heuristic to flag out patterns like multi-digit selectors, high ratio of mixed case characters, and high ratio of number -> letter or letter -> number transitions. With that that should satisfy selectors that are like #1241 or RaNdOm or 4f2d1, but maybe there's edge cases I haven't thought of that would make that return false positives/negatives.

Curious on your thoughts.

DeploySentinel / Recorder

Can we detect "random" ids to avoid them? #27