Dialect specifier breakage

kylebgorman commented 1 year ago

    @pytest.mark.skipif(not can_connect_to_wiktionary(), reason="need Internet")
    def test_american_english_dialect_selection():
        # Pick a word for which Wiktionary has dialect-specified pronunciations
        # for both US and non-US English.
        word = "mocha"
        html_session = requests_html.HTMLSession()
        response = html_session.get(
            _PAGE_TEMPLATE.format(word=word), headers=HTTP_HEADERS
        )
        # Construct two configs to demonstrate the US dialect (non-)selection.
        config_only_us = config_factory(key="en", dialect="US | American English")
        config_any_dialect = config_factory(key="en")
        # Apply each config's XPath selector.
        results_only_us = response.html.xpath(config_only_us.pron_xpath_selector)
        results_any_dialect = response.html.xpath(
            config_any_dialect.pron_xpath_selector
        )
>       assert (
            len(results_any_dialect)  # containing both US and non-US results
            > len(results_only_us)  # containing only the US result
            > 0
        )
E       AssertionError: assert 2 > 2
E        +  where 2 = len([<Element 'li' >, <Element 'li' >])
E        +  and   2 = len([<Element 'li' >, <Element 'li' >])

tests/test_wikipron/test_config.py:202: AssertionError

kylebgorman commented 1 year ago

The breakage indicates that even with dialect selection enabled at US | American English you actually obtain all pronunciations. E.g. for this page used in the tests, we grab both elements under the Pronunciation header even though the latter does not match the dialect specification.

kylebgorman commented 1 year ago

This is currently blocking #509.

kylebgorman commented 1 year ago

Hi @jacksonllee sorry to bother, any intuitions about what's going on here? I suspect the failure of Latin to grab anything in #509 is related too.

kylebgorman commented 9 months ago

The issue seems to be that the dialect selector wants @class = "ib-content qualifier-content" but it's now just @class = "ib-content". I'll try this fix out and report back in a few days.

CUNY-CL / wikipron

Dialect specifier breakage #511