Stuck in an unsaved changes loop

afbora commented 2 years ago

Related https://github.com/getkirby/kirby/pull/3758 PR and still continue unicode characters issue.

You can use following content to reproduce the issue:

[{"attrs":{"width":"full-width","mbottom":"true","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"slides":[{"title":"\u0160ildymo sistemas pri\u017ei\u016brin\u010di\u0173 profesional\u0173 ir savo \u0161ildymo sistema besir\u016bpinan\u010di\u0173 vartotoj\u0173 pam\u0117gti cheminiai priedai","bg":["slide-1.png"],"text":"<ul><li><p>Skland\u017eiam ir ekonomi\u0161kam sistemos veikimui<\/p><\/li><li><p>Efektyviam \u0161ildymo problem\u0173 sprendimui<\/p><\/li><\/ul>","button":{"type":"page","value":"contacts","text":"Daugiau informacijos"}},{"title":"Priedai sistem\u0173 apsaugai nuo korozijos, purvo, nuovir\u0173","bg":["slide-2.png"],"text":"<ul><li><p>Tausoja katil\u0105, \u0161ilumos siurbl\u012f, kitus \u0161ildymo prietaisus ir \u012frang\u0105.<\/p><\/li><li><p>I\u0161lieka veiksmingi vis\u0105 sistemos eksploatavimo laik\u0105.<\/p><\/li><\/ul>","button":{"type":"page","value":"contacts","text":"Daugiau informacijos"}},{"title":"Magnetiniai \u2013 cikloniniai \u0161ildymo sistemos filtrai","bg":["slide-3.png"],"text":"<ul><li><p>Padeda pasi\u0161alinti \u012fvairiems ter\u0161alams ir apsaugo \u0161ildymo prietaisus nuo pirmalaikio gedimo.<\/p><\/li><li><p>Modeliai kompakti\u0161kai ir erdviai katilinei.<\/p><\/li><li><p>10 \u2013 25 met\u0173 garantija.<\/p><\/li><\/ul>","button":{"type":"page","value":"products","text":"Daugiau informacijos"}},{"title":"\u0160ildymo ir \u0161aldymo sistem\u0173 plovikliai","bg":["slide-4.png"],"text":"<ul><li><p>I\u0161tirpdo \u012fvairias nuos\u0117das.<\/p><\/li><li><p>R\u016bg\u0161tiniai ir ner\u016bg\u0161tiniai.<\/p><\/li><li><p>Tinkami visiems praplovimo \u012frenginiams.<\/p><\/li><\/ul>","button":{"type":"page","text":"Daugiau informacijos","value":"products\/plovikliai"}},{"title":"\u0160ildymo sistemos prot\u0117ki\u0173 hermetizavimo priemon\u0117s","bg":["slide-5.png"],"text":"<ul><li><p>Efektyvi pagalba atsiradus mikro prot\u0117kiams.<\/p><\/li><li><p>Suveikia per kelias valandas, nereikia i\u0161leisti sistemos.<\/p><\/li><\/ul>","button":{"type":"page","value":"products\/protekiu-hermetikai","text":"Per\u017ei\u016br\u0117ti produktus"}}],"height":"h-medium","align":"align-left","valign":"valign-middle","paddingleft":null,"paddingright":52,"overlaytoggle":"false","overlaytop":"#FFFFFF","overlaybottom":"#FFFFFF","overlaymobiletoggle":"true","overlaymobiletop":"#FFFFFF82","overlaymobilebottom":"#FFFFFF82","bgmobileposition":null,"titlestyle":"hstyle2","titlecolor":"#0F4C87","tagtitle":"h2","textstyle":"tstyle2","textcolor":"#0F4C87","fontsizea":"","fontweighta":""},"id":"38e799f7-99b2-41e9-8f9c-a046b8f807f6","isHidden":false,"type":"slides"}],"id":"3f93abad-add6-4267-bc58-268f7ee5f18b","width":"1\/1"}],"id":"c35e77bb-e370-4c26-b104-8be6c1184d04"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h2","text":"FERNOX \u2014\u00a0jau 57 metus \u2014\u00a0patikimi, profesional\u016bs, aplink\u0105 tausojantys sprendimai j\u016bs\u0173 \u0161ildymo sistemai","halign":"","styleheading":"hstyle2"},"id":"7733aede-1e08-444e-93d7-0b42b0a04924","isHidden":false,"type":"heading"},{"content":{"features":[{"icon":["network.svg"],"title":"Fernox pasaulyje","text":"<p>Fernox - pasaulyje gerai \u017einomas vandens prie\u017ei\u016bros chemikal\u0173 ir \u00a0\u012frangos gamintojas. \u012emon\u0117 prad\u0117jo veikl\u0105 1964 metais Jungtin\u0117je Karalyst\u0117je, o \u0161iandien Fernox produktais prekiaujama daugiau nei 40-yje \u0161ali\u0173 Europoje, Amerikoje ir Azijoje.<\/p>"},{"icon":["science-molecule-strucutre.svg"],"title":"Produkt\u0173 panaudojimas","text":"<p>Fernox produktai skirti tiek priva\u010di\u0173, tiek pramonini\u0173 \u0161ildymo ir \u0161aldymo sistem\u0173 vandens paruo\u0161imui, vamzdyn\u0173 ir \u012frengini\u0173 praplovimui ir nukalkinimui, prot\u0117ki\u0173 hermetizavimui, apsaugai nuo u\u017e\u0161alimo. Platus magnetini\u0173\/ciklonini\u0173 \u0161ildymo sistemos filtr\u0173 pasirinkimas. Sukurta atskira produkt\u0173 grup\u0117 atsinaujinan\u010di\u0173 i\u0161tekli\u0173 \u0161ildymo sistem\u0173 prie\u017ei\u016brai.<\/p>"},{"icon":["file-iso.svg"],"title":"Kokyb\u0117","text":"<p>\u012emon\u0117 nuolat investuoja \u012f\u00a0mokslinius tyrimus ir ie\u0161ko novatori\u0161k\u0173 bei\u00a0aplink\u0105 tausojan\u010di\u0173 sprendim\u0173. Gymybos procese \u012fdiegti ISO 9001, ISO 14001, ISO\/TS 16949 ir OHSAS 18001 standartai. Produktai sertifikuoti \u012fvairiose nepriklausomose ES laboratorijose.<\/p>"}],"cols":4},"id":"1a1daaff-8bf8-4d04-9c13-6210f20843ba","isHidden":false,"type":"features"}],"id":"3a59ae10-4149-48a9-bd38-16bbfd0179b2","width":"1\/1"}],"id":"12fbd5d0-4d90-48f6-8ce4-c221e29e2f79"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h2","text":"Norite prad\u0117ti naudotis m\u016bs\u0173 produktais?","halign":"left","styleheading":""},"id":"8134a98d-16d6-4ce5-ac11-3f42f901c194","isHidden":false,"type":"heading"}],"id":"a47bdbd3-2ab3-4793-9e11-4a2eb60e08ef","width":"1\/1"}],"id":"05e639e9-3dde-4387-a197-128c0578e341"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h3","text":"Skambinkite:\u00a0+370 685 06 811","halign":"left","styleheading":""},"id":"fe43f682-998e-4359-a9a3-e60a931437a7","isHidden":false,"type":"heading"},{"content":{"text":"<p><strong>Atstovas Lietuvoje UAB \"Sandomus\"<\/strong><\/p><p>\u012emon\u0117s kodas: 302594355<\/p><p>PVM kodas: LT100005964216<\/p><p>Mobilus: +370 685 06 811<\/p><p>El. pa\u0161tas: info@fernox.lt<\/p><p><\/p><p>Adresas: S. \u017dukausko g. 32-7, Vilnius<\/p>"},"id":"d89ce57d-1adf-4f4a-b082-1abebdc99b7d","isHidden":false,"type":"text"}],"id":"183a9a54-8723-4a29-8dec-4fac60f1d96b","width":"1\/2"},{"blocks":[{"content":{"title":"","name":"Vardas \/ \u012emon\u0117","email":"El. pa\u0161tas","phone":"Telefonas","message":"Papildoma informacija","buttontext":"Si\u0173sti"},"id":"2e8c1a2e-51de-4c5b-bcd1-2c50ece48df8","isHidden":false,"type":"contactform"}],"id":"99b084c3-8c9b-4999-a444-f7a46820ffb5","width":"1\/2"}],"id":"90df7d47-2761-4c74-a36e-4fcb0a641f6e"}]

Kirby 3.6.0-beta.3

afbora commented 2 years ago

I noticed that the Sane class convert spaces in unicode 🤷‍♂️ My brain is burned with Sane/Dom/Parsley classes 🤯

bastianallgeier commented 2 years ago

@lukasbestle do you have an idea why spaces are converted?

lukasbestle commented 2 years ago

The conversion most likely happens inside libxml (PHP DOM classes) between parsing and conversion back to a string.

Is the converted result incorrect? Or is it just about the unsaved changes loop?

distantnative commented 2 years ago

Sounds to me like our setup (e.g. the Dom class) has issues with encodings - like it receives A in encoding X, processes it and spits out B in encoding Y. This gets passed to the Panel and browser, where automatically it converts somehow to A in X again which is compared to B in Y - it differs, so we have the changes bar. And starting from top endlessly.

I don't know the Dom class well, but I see some transformations and assumptions about very specific encodings: https://github.com/getkirby/kirby/blob/develop/src/Toolkit/Dom.php#L73-L75

E.g. what irritates me there is that the comment says ISO-8859-1 is need, but the conversion is actually to HTML-ENTITIES.

Maybe the cause is located somewhere around there?

lukasbestle commented 2 years ago

E.g. what irritates me there is that the comment says ISO-8859-1 is need, but the conversion is actually to HTML-ENTITIES.

libxml parses the HTML as ISO-8859-1 unless there is a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> tag in the input HTML (the new shorter HTML5 variant of this is not supported). So unless we inject such a meta tag and remove it manually again, we need to input code in ISO-8859-1 to the loadHTML() method. And that is done by converting everything to entities. As the entity names only use ASCII characters, the entities are valid in ISO-8859-1 and therefore parsed correctly. If we converted directly to ISO-8859-1 instead of to HTML-ENTITIES, every UTF-8 char without an equivalent would be converted to a question mark, so content would get lost.

The "inject a meta tag" route would probably be the "better" solution, but also be a hack and I'm not sure if it will cause other issues, e.g. if there's already a meta tag in the input. So I decided for entity escaping as it's also suggested in the comments in the PHP docs.

@afbora I don't fully understand which part of your test case causes which issue. Could you convert your test case into a test that directly uses the Dom class? I.e. an HTML string that is as short as possible but exhibits the wrong behavior when parsed and printed by the Dom class.

nilshoerrmann commented 2 years ago

If you want to load UTF-8 content, this has worked for us pretty well over the years:

$document->loadHTML('<?xml encoding="UTF-8">' . $html);

This is how we use it: https://github.com/hananils/kirby-tree-methods/blob/09d25aae7fadb403984c02282349b387299b4ea9/lib/tree.php#L39

lukasbestle commented 2 years ago

Oh, that looks pretty nice. Thanks, I'll try it out!

afbora commented 2 years ago

@lukasbestle You can test following text on writer field:

FERNOX — jau 57 metus — patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai

afbora commented 2 years ago

@lukasbestle FYI, also when you remove the Sane::sanitize(), works great!

https://github.com/getkirby/kirby/blob/develop/config/fields/writer.php#L32

afbora commented 2 years ago

If I was able to get it right, @nilshoerrmann's suggestion didn't work :((

afbora commented 2 years ago

Saved without sane sanitize and no stucks ✅

<p>FERNOX —&nbsp;jau 57 metus —&nbsp;patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai</p>

Saved with sane sanitize and stuck 💥

<p>FERNOX — jau 57 metus — patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai</p>

lukasbestle commented 2 years ago

I tried several options with the following result (simple test with just new Dom($string)->toString(); input = expected result):

$this->doc->loadHTML($code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><body><p>TEST &acirc;&#128;&#148;&Acirc;&nbsp;j&Aring;&laquo;s&Aring;&sup3; &Aring;&iexcl;ildymo sistemai</p></body></html>

❌ Input is parsed as ISO-8859-1, which completely destroys the Unicode multibyte sequences.

$this->doc->loadHTML(mb_convert_encoding($code, 'HTML-ENTITIES', 'UTF-8'))

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><body><p>TEST &mdash;&nbsp;j&#363;s&#371; &scaron;ildymo sistemai</p></body></html>

❌ As the input is encoded as entities, the output also is encoded as entities.

$this->doc->loadHTML('<?xml encoding="UTF-8">' . $code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<?xml encoding="UTF-8"><html><body><p>TEST &mdash;&nbsp;j&#363;s&#371; &scaron;ildymo sistemai</p></body></html>

❌ The input doesn't contain entities, but weirdly libxml still converts everything to entities on output. Seems like it exports as ISO-8859-1 even though the XML declaration says the document is UTF-8.

$this->doc->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><p>TEST — jūsų šildymo sistemai</p></body></html>

✅ Promising, but we need to remove the added meta tag without removing any other tag that was already in the input.

lukasbestle commented 2 years ago

I'm onto something here. Let's hope it will work. :)

afbora commented 2 years ago

I think the issue is that the writer field's handling of the value/text conflicts with that of the sane class. But I couldn't figure out.

bastianallgeier commented 2 years ago

Ok, I'll try to explain the underlying problem a bit to keep track of it here.

Form changes

As soon as content gets loaded from the server to fill the form, a copy of it is stored in Vuex as original form values.

Whenever a form input updates, the updated value is also stored in Vuex in a second "changes" object. The original values object and the changes object are then compared against each other to find out of something has changed. In this case, the orange bar shows up.

This is the basic logic behind the changes bar.

Writer

The Writer initializes ProseMirror under the hood. ProseMirror receives the HTML from the server and turns it into its own JSON document format. In order to create this format, it has to be fed with all kinds of rules about allowed block level elements and inline elements. It will automatically do all the hard work and filter out unwanted shit in the HTML – kill attributes, styles, unwanted tags, etc. It is very strict when performing this task.

That JSON document format can then be turned back into clean HTML again. But that HTML might differ from the original HTML the Writer received. For example, when you give the Writer this:

<p>Some&nbsp;<b>Text</b></p>

The output after the clean-up would be:

<p>Some <strong>Text</strong></p>

This is actually really nice, but it also leads to a big problem.

When the form is loaded, the original value in Vuex would be the string from the server. We already clean up the HTML there with our Sane class and the clean up process is already very very close to what ProseMirror is doing, but it will never be 100% the same unless we'd be able to let ProseMirror run on the server, which is unfortunately not possible.

So the original already differs from the initial value once ProseMirror has been initalized. ProseMirror therefor already sends an input event and the changes object contains a value that's no longer the same than the original. That's why the changes bar appears immediately and can also never be fully removed.

bastianallgeier commented 2 years ago

After writing this all down, I might have found a solution in the PR above.

bastianallgeier commented 2 years ago

✅

getkirby / kirby

Stuck in an unsaved changes loop #3798

Form changes

Writer