boonebgorges / buddypress-group-email-subscription

Fine-grained email subscription for activity in BuddyPress groups
36 stars 33 forks source link

Charset or Umlaut Problem with äöüß #245

Closed christianlersch closed 1 year ago

christianlersch commented 1 year ago

Hi,

since 4.0.1 it seams that I have a Problem with the Umlaut or the charset. In 4.0.0. everything works fine.

Do you have a idea what could be the problem?

Cheers Chris

Bildschirmfoto 2023-10-31 um 10 43 10

rawmail.txt

boonebgorges commented 1 year ago

Thanks for the detailed report. It's worth noting that there are several different types of text that are being rendered incorrectly:

However, diacritics in text like ...für dise Gruppe zu ändern... are rendered properly.

Looking at the rawmail.txt file, it appears that all of the characters with umlauts are being converted to HTML entities. So this doesn't appear to be a database encoding issue, but instead is probably some incorrect escaping. Interestingly, the same kind of escaping seems to be taking place for both the correctly and the incorrectly rendered bits of text:

I'm unsure why the escaping is happening in the first place, or why it's resulting in the proper HTML codes in some cases but not in others. I'm also unsure why this would have arisen specifically in the 4.0.1 release. I'll do some tests and see if I can reproduce the issue.

boonebgorges commented 1 year ago

I'm unable to reproduce the issue, but the evidence does suggest that there's a character-encoding issue. It looks like, in some cases, strings are being stored as ISO-8859-1 but are being converted as if they're UTF-8, then escaped as HTML codes. Here's a brief script that shows how ä gets mangled in this way:

<?php

// Original UTF-8 string
$string = 'ä';

// Get the byte sequence of the character in UTF-8
$bytes = (string) $string;

// Get the ASCII values of the bytes directly
$char1 = chr(ord($bytes[0]));
$char2 = chr(ord($bytes[1]));

// Convert the bytes to HTML entities based on their ASCII values
$htmlEntity1 = htmlentities($char1, ENT_QUOTES, 'ISO-8859-1');
$htmlEntity2 = htmlentities($char2, ENT_QUOTES, 'ISO-8859-1');

echo "Original UTF-8 character: " . $string . "\n";
echo "Misinterpreted characters: " . $char1 . $char2 . "\n";
echo "HTML entities: " . $htmlEntity1 . $htmlEntity2 . "\n";

which gives us:

Original UTF-8 character: ä
Misinterpreted characters: ä
HTML entities: &Atilde;&curren;

Those HTML entities are what we're seeing in your email text in place of ä.

So there remains a bunch of questions about how this is happening in the application, why it's only affecting some strings, and why it seems to be linked to BPGES 4.0.1. Could you provide a bit more info:

  1. What is your database encoding? I guess wp_bp_activity is the relevant table for us: https://wincent.com/wiki/Finding_out_the_encoding_of_a_MySQL_database
  2. Can you confirm that your server is running the mbstring PHP extension?
  3. Can you share a list of other plugins you're running, especially any that are related to the way that emails are sent?
christianlersch commented 1 year ago

Hi. Thank you very much vor your help. I clone the project try in with version excactly it broke.

I figured out that it was in this revision 9fbeb83a8c7acf9bbe2ee6ab066bd533b31e9550 in the file bp-activity-subscription-functions.php

Bildschirmfoto 2023-11-07 um 10 57 23

If I change it from return wp_mail( $to, $args['subject'], $plaintext_content, $headers );

to return wp_mail( $to, $args['subject'], $args['content'], $headers );

So I think the Problem could be in the ass_email_convert_html_to_plaintext.

We also use the Plugins https://de.wordpress.org/plugins/wp-html-mail/ and https://de.wordpress.org/plugins/wp-html-mail-buddypress/.

Is there a chance to fix the Problem for me?

Thank you very much!

boonebgorges commented 1 year ago

Thanks for your debugging @christianlersch !

I had a closer look and I think I might have identified what's happening. Are you able to apply this change to your installation to see if it resolves the issue? https://github.com/boonebgorges/buddypress-group-email-subscription/commit/1063628f719619cf387eaf8015705d4eaab2b17d

christianlersch commented 1 year ago

Thank you very much! For the Umlauts it works perfekt! Very good.

But there is another Problem. There is already a A-Tag in the Text and it transforms to: &lt;a href=&quot;<a href="https://kora21preview.orangenkiste.eu/mitglieder/chris/&quot;&gt;chris&lt;/a&amp;gt" style="color: #777777; font-style: normal; font-weight: normal; text-decoration: underline; text-transform: none;">https://kora21preview.orangenkiste.eu/mitglieder/chris/&quot;&gt;chris&lt;/a&amp;gt</a>;

Is there also a solution for that?

Bildschirmfoto 2023-11-08 um 09 47 00 Bildschirmfoto 2023-11-08 um 09 46 29

Sorry, but we need this second Email-Template Plugin because all Mails that go out from the WP should have the same look. :-/

boonebgorges commented 1 year ago

But there is another Problem. There is already a A-Tag in the Text and it transforms to:

Were you seeing this behavior in the HTML email even before applying https://github.com/boonebgorges/buddypress-group-email-subscription/commit/1063628f719619cf387eaf8015705d4eaab2b17d? I can't reproduce this in my initial tests, and I'm wondering if it's unrelated.

christianlersch commented 1 year ago

So Sorry! It was my fault! I not change the file bp-activity-subscription-functions.php back to return wp_mail( $to, $args['subject'], $plaintext_content, $headers );

Now everything works fine! Thank you very much for your help! You are the best!

I have another question about a translation Problem, but I will make a new topic.

boonebgorges commented 1 year ago

Great! Thank you for confirming. This fix will be part of the 4.2.2 release.