cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
47.02k stars 3.18k forks source link

Selectors in `cy.get()` and `cy.contains()` cannot find invalid table HTML elements in some situations #25893

Open gvaatstra opened 1 year ago

gvaatstra commented 1 year ago

Current behavior

When I use cy.contains('start-html', regex'), it does not seem to traverse further in the DOM to match.

Given the simplified HTML

<div data-cy='table_aandachtspunten'>
    <table>
        <tbody>
            <tr data-cy='rowX'>
                <a href='abc.com'>"DBU.VR.1"
            </tr>
        </tbody>
    <table>
</div>

It finds the a with both below contain statements (either the full DOM or a specific selector): cy.contains(^DBU\.VR\.1$).should("be.visible"); cy.contains('[data-cy=tableA] tr a', theRegex).should("be.visible");

but not with a parent element selector (tr which has the needed a element) cy.contains('[data-cy=tableA] tr', theRegex).should("be.visible");

The relevant code I use:

    const theRegex = new RegExp(`^${Cypress._.escapeRegExp("DBU.VR.1")}$`);
    cy.contains(theRegex).should("be.visible");
    cy.contains('[data-cy=table_aandachtspunten] tr a', theRegex).should("be.visible");
    cy.contains('[data-cy=table_aandachtspunten] tr', theRegex).should("be.visible");

image

I would expect it to find it also when it's not on the exact element

PS: What I also noticed is that the log (both in the runner and the cy.log command) doesn't include the escape characters, but they are present (when I use a cy.task with console.log, I see them).

Desired behavior

Cypress should find the tr element containing the regex just like you would find it by using cy.contains('parent-element', 'text-to-find').

Test code to reproduce

see current behavior

Cypress Version

12.5.1

Node version

16.19.0

Operating System

MacOS

Debug Logs

No response

Other

No response

marktnoonan commented 1 year ago

Thanks for this report @gvaatstra.

I had to tweak the example a bit (as posted, even cy.contains(theRegex).should("be.visible"); was not passing for me. But I've reproduced this in a fork of cypress-test-tiny that you can check out if you like.

This doesn't seem to be about regex specifically. The table in your example is invalid markup and fails when matching regex or a plain string. <tr> elements may only have either <th> or <tr> children.

It's possible we would do something about this on the Cypress side after investigating some more. At a glance, other invalid markup that's not part of a table works fine, for example cy.contains('ul figcaption', 'hello'). It's possible this is an underlying limitation in jQuery, since the selector is really a jQuery selector under the hood.

I will route this to the team.

In the meantime, your workaround would be to have the table conform to a valid structure, if you control it. Or avoid selectors that cross an inavlid parent-child element boundary.

marktnoonan commented 1 year ago

Discovered that this also applies to cy.get() for invalid table markup, which makes sense: https://github.com/marktnoonan/contains-regex/blob/master/cypress/e2e/spec.cy.js

I'm going to change the title of this issue to better reflect the underlying problem.

gvaatstra commented 1 year ago

My bad. I tried to write a simple example and typed some HTML just out of my head, but should have started with a simple cypress-test-tiny to be more realistic. At first I was unable to reproduce with simply copying the table. When I copy the full page, I can reproduce it, but apparently it's not that evident as I thought. I can send you a reproduction by email if you like?

marktnoonan commented 1 year ago

@gvaatstra based on what you emailed I think there's a pretty clear explanation, which also explains why I needed to modify the regex in my example to remove the ^ and $.

Take this code as an example:

    const text = "DBU.VR.1";

    const regex = new RegExp(`^${Cypress._.escapeRegExp(text)}$`);

When matching the specific target element, the text content is a 1:1 match for this regular expression. But when locating an ancestor element, the text content of that ancestor fails the regex because it would include some whitespace at the start and end.

If we use the following regular expression, we can ignore any whitespace before and after the target contents:

    const regex = /^\s*DBU\.VR\.1\s*$/;

And I'm sure there are multiple other ways to get around this.

So it seems that this issus represents two things:

  1. Your first example code stumbled onto edge case in which invalid markup can't be tested by Cypress
  2. Your private example that you sent by email did show that cy.contains() behaves differently about ignoring whitespace when a regex is used instead of a plain text string. That might actually be expected behavior in many regex situations, but I can see how it's confusing here.

This links to a few other issues around whitespace: https://github.com/cypress-io/cypress/issues/6405 and https://github.com/cypress-io/cypress/issues/3887 at least.

gvaatstra commented 1 year ago

Thanks for looking into it! I'll take whitespace in account knowing this now

marktnoonan commented 1 year ago

Thanks @gvaatstra, I'm going to re-open this since it does capture unexpected behavior that we may address in the future, either with fixes in the app or documentation changes to call out the exceptions.