Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.78k stars 893 forks source link

Sentences containing ` ` are not highlighted in Consecutive sentences and Keyphrase distribution assessments #19776

Open FAMarfuaty opened 1 year ago

FAMarfuaty commented 1 year ago

Please give us a description of what happened

Sentences that contain   are not highlighted in Consecutive sentences assessment. Please note however, the same sentence is highlighted in Passive voice assessment for example if it's a passive sentence.

To Reproduce

Step-by-step reproduction instructions

  1. Install and activate Yoast SEO
  2. Create a post
  3. Add a text to the post, at least 300 words
  4. Add the following sentences
Apples have been grown for thousands of years in Asia and Europe and were brought to North America by European colonists. Apples have religious and mythological significance in many cultures, including Norse, Greek, and European Christian tradition. Apples grown from seed tend to be very different from those of their parents, and the resultant fruit frequently lacks desired characteristics.

NOTE: add the sentences above in HTML/Code editor mode

  1. Go to Consecutive sentences assessment, confirm that it recognises an occurrence of consecutive sentences with the same word beginning
  2. Click on the eye icon
  3. Confirm that the sentences that contain   are not highlighted

Expected results

  1. The sentences that contain   are highlighted in the Consecutive sentences assessment

Actual results

  1. The sentences that contain   are NOT highlighted in the Consecutive sentences assessment

Screenshots, screen recording, code snippet

If possible, please provide a screenshot, a screen recording or a code snippet which demonstrates the bug.

https://user-images.githubusercontent.com/48715883/215789020-0587852d-4476-449e-a271-d371bf0acbf0.mov

Technical info

Used versions

hannaw93 commented 1 year ago

Ran out of ideas for now, these are the results of the investigation:

  1. The problem happens both in the Classic and Block editor
  2. It occurs also with passive voice and sentence length, but not with keyphrase assessments, paragraph length, or transition words. This made me focus on the code in and around getSentences, but I could not find any pointers to the cause of the bug. I also played around with different texts with non-breaking spaces (including non-breaking spaces between sentences, non-breaking spaces between words, and both combined with normal spaces and not) in getSentencesSpec, and things there work as expected.
  3. Just in case, tried pasting the text both ways of editing in html (does not make a difference).
  4. There was no test for the code in normalizeHTML.js responsible for turning  ` into actual non-breaking spaces (U+00A0). I ran some tests for it in normalizeHTMLSpec.js (with different variations of using non-breaking spaces) and also found no visible issues.
  5. When adding non breaking spaces to the input text in tests for getFieldsToMarkSpec, the marked version returns a text without a non-breaking space. This shouldn't be an issue though since the markers seems to work/apply (screenshot). The tests for MarkSpec return "marked" as expected whether the original input has a non breaking space or not.

Image

hannaw93 commented 1 year ago

slack conversation https://yoast.slack.com/archives/C03PRESAXMJ/p1678719135469269

hannaw93 commented 1 year ago

Update: somehow when I re-test this now, the problem does not occur for passive voice and sentence length. Now on my side it also occurs only for Consecutive sentences.

hannaw93 commented 1 year ago

This issue is on hold to be potentially addressed when we’re editing this assessment for the HTML parser project.

mhkuu commented 1 year ago

As discussed, we will hopefully solve this problem with the HTML parser.

FAMarfuaty commented 1 year ago

This issue can also be reproduced in Keyphrase distribution assessment where keyphrase giant panda doesn't match giant panda