Closed ebeshero closed 10 months ago
We see this on C08_app78 with the regex replacement around
['i', 'feel']
We recognized that there is a problem with the ECMAscript regex search and replace pattern in the seg.tsx file.We propose that the regex pattern is too complicated because it's negative lookahead and negative lookbehind. Instead we should searching on these simple positive patterns only:
Search:
['
- Replace with:
["
Search:
',\s'
- Replace with:
" "
Search:
']
- Replace:
"]
@Yuying-Jin
solved by n?.replace(/%q%/g, '\\"').replace(/([\[\]\s,<>])'/g, '$1"').replace(/'([\[\]\s<>,])/g, '"$1')
We think we fixed lots of these now, but we're concerned about double-quote replacement not happening properly in the normalized tokens representing <longToken>
passages, such as __Coleridge's "Ancient Mariner" in MS C10:
<longToken><metamark>*</metamark> <anchor xml:id="c56-0049.01"/>
<zone type="left_margin" sID="c56-0049__left_margin"/>
<metamark>_______________</metamark>
<milestone spanTo="#c56-0049.03" unit="tei:note"/>
<metamark>*</metamark>Coleridge's "Ancient Mariner."
<anchor xml:id="c56-0049.03"/>
<zone eID="c56-0049__left_margin"/></longToken>
Python script replacement of double quotes currently is: normalized = re.sub(r'(â|â|")', '%q%', normalized)
NOTE: inside longToken, when we have <note resp="MWS">
the attribute value quotation marks are properly replaced by %q%
. So why is it NOT working when the quotation marks are in the flattened text node of a <longToken>
passage.
The problem is from post-processing xslt file in collationWorkspace repo. We tried to replace &quot;
with "
to fix the problem about &
. Now we replace &quots;
to %q%
.
https://github.com/FrankensteinVariorum/collationWorkspace/blob/dab9a224cd2f86fcab29e3836429b93b0a5fc6e6/xslt/postProcessing.xsl#L67
We see this on C08_app78 with the regex replacement around
['i', 'feel']
We recognized that there is a problem with the ECMAscript regex search and replace pattern in the seg.tsx file.We propose that the regex pattern is too complicated because it's negative lookahead and negative lookbehind. Instead we should searching on these simple positive patterns only:
['
["
',\s'
" "
']
"]
@Yuying-Jin