Open vikasrungta92 opened 3 years ago
Any update on the issue please.
Any update of this issue??
I created a workaround that may only work in certain cases. In my case it is for preserving leading spaces, specifically for Draftail by @thibaudcolas, an integration of Draft.js to Wagtail.
The workaround is to modify the HTML on the backend side:
\n
and non-breaking spaces into non-breaking spaces onlyIn my case I wanted to only preserve leading whitespaces that contain at least one NBSP, so I created this function that I use every time I save HTML. It has to be adjusted to fit other uses cases, but the idea of normalizing spaces + inserting a zero-width space could work for you too.
import re
BREAKING_SPACES_AFTER_NEWLINE_RE = re.compile(r'\n([ \t]+)')
SUCCESSIVE_BREAKING_SPACES_RE = re.compile(r' +(<[^>]+?>)? +')
BREAKING_SPACE_AFTER_BR = re.compile(r'(<br\s*/>) ')
def normalize_whitespaces(html: str) -> str:
# Rules taken from https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace
html = BREAKING_SPACES_AFTER_NEWLINE_RE.sub(r'', html)
html = html.replace('\t', ' ').replace('\n', ' ')
html = SUCCESSIVE_BREAKING_SPACES_RE.sub(r' \1', html)
html = BREAKING_SPACE_AFTER_BR.sub(r'\1', html)
return html
# Breaking spaces at the beginning of the string or after an HTML tag.
LEADING_BREAKING_SPACE_RE = re.compile(r' (?=[\u00A0\u202F])')
TRAILING_OR_NESTED_BREAKING_SPACE_RE = re.compile(r'(?<=[\u00A0\u202F]) ')
NBSP_GROUP_RE = re.compile(r'(?<![\u200B\u00A0\u202F])([\u00A0\u202F]+)')
def fix_draftail_leading_trailing_whitespaces(html: str) -> str:
html = normalize_whitespaces(html)
# We force successions of breaking & non-breaking spaces to be converted
# into non-breaking spaces only, otherwise Draftail (maybe even Draft.js)
# ignores all spaces.
html = LEADING_BREAKING_SPACE_RE.sub('\u00A0', html)
html = TRAILING_OR_NESTED_BREAKING_SPACE_RE.sub('\u00A0', html)
# We add a zero-width space before groups of non-breaking spaces.
# This is a workaround because Draftail (maybe even Draft.js) ignores
# leading non-breaking spaces.
# Zero-width spaces are not considered as spaces by most libraries.
# This is why this trick works for preserving leading non-breaking spaces.
# Those zero-width spaces can be removed from the data at any point,
# they do not mean anything in terms of data.
zero_width_space = '\u200B'
return NBSP_GROUP_RE.sub(fr'{zero_width_space}\1', html)
I could finally find a use for my knowledge in weird Unicode characters :laughing:
Do you want to request a feature or report a bug?
Bug/Feature
What is the current behaviour?
My application heavily depends on DraftJS. I am generating the editorState using convertFromHtml API. I am passing in the HTML text which contains the whitespaces at both end of the text.
Following are behaviour observed:(consider '-' as a whitespace)
When I pass the HTML : ---This is Amazing.--- Within content blocks I can see the text value as : "---This is Amazing.---" => as expected Demo: link
Passed HTML: ---This is Amazing.--- (Amazing is encapsulated with em tag) Within content blocks I can see the text value as : "---This is Amazing.---" => as expected Demo: link
But, When the passed HTML is : ---This is Amazing.--- (This is encapsulated with em tag) Text in content block is shown as: ''This is Amazing." => Which is incorrect Demo: link That is, when we pass the html where the just after the whitespace, HTML tag element is present then all the whitespaces are dropped.
Note: That in demo, one whitespace is getting dropped each time.
I am not sure why the whitespaces are getting dropped from HTML.
Scenarios Tested but no luck:
Can anyone please guide me if I am missing anything here? It is important for me to retain these whitespaces as the content/HTML which is passed are critical content which we cannot drop.
Or, is there any workaround I can use to make it work?
Version Used: Chrome: 88.0.4324.150 DraftJS: 0.10.x Mac/Windows: latest
Thanks in advance.