Open panaC opened 3 weeks ago
Not supported yet, need to found a strong library to handle this.
I ported the relevants parts of the official polyfill from Javascript to Typescript, I adapted the logic to meet our needs and I fixed some bugs (I reported them upstream too):
...unfortunately at this stage my work remains in a branch because the DOMRange-to-TextFragment logic fails in a reproducible manner with my goto test ebook (AccessibleEPUB3) and I ran out of time to troubleshoot further.
https://github.com/readium/r2-navigator-js/compare/develop..feat/text-fragments
refined by a textPositionSelector inside the node container
{
"type": "CssSelector",
"value": "#original-content",
"refinedBy": {
"type": "TextPositionSelector",
"start": 58,
"end": 138
}
}
This doesn't make sense. A CSSSelector references a DOM Element, and the start
/end
integers reference character positions inside a DOM TextNode (a DOM element can contain no children, can contain multiple sibling child text nodes although this is typically normalised into a single TextNode unless the CDATA is marked as such, can contain mixed / interspersed TextNode + Elements at the first children layer and so on recursively / deeper in the descendants)
Follow internal discussion with the team.
I will try to explain how works the dom range serialisation in r2-navigator-js.
Dom Range is the representation of a range of a start and end element. It could be 2 TEXT_NODE with the start and end offset at character level or it could be 2 ELEMENT_NODE with the start and end offset at the child index.
Dom Range is serialise in r2-navigator-js to an object of 6 values
{
startContainerElementCssSelector: string;
startContainerChildTextNodeIndex: number;
startOffset: number;
startContainerElementCssSelector: string;
startContainerChildTextNodeIndex: number;
startOffset: number;
}
A css Selector cannot reference a TEXT_NODE like Daniel said, so we have to specify the index of the TEXT_NODE in function of the parent element. In that case the full possibility of a DOM Range is preserved and can be fully recreated.
When an element is an ELEMENT_NODE : childTextNodeIndex
value is set to -1 and ignored , cssSelector value and offset is enough to get back the Range. When an element is other that an ELEMENT_NODE like a TEXT_NODE most of the time, the cssSelector is targeted to the parent of the TEXT_NODE for example, that means the childTextNodeIndex
is the position of the TEXT_NODE in the parent tree, so we can get the complete serialisation of a DOM range.
So this is the reason why we lost information with a cssSelector refined by a textPositionSelector, it need to reconstruct the structure of the DOM Element with only a start and end character without to know what TEXT_NODE index element is targeted. textPositionSelector has to travel the graph to extract every text length recursively up to obtain the position of the TEXT_NODE index wanted.
XPath doesn't have this issue, since we can serialise any kind of element like TEXT_NODE with text()[index]
.
For example with a valid XPath refinedBy
:
{
"type": "RangeSelector",
"startSelector": {
"type": "CssSelector",
"value": "p:nth-child(24)",
"refinedBy": {
"type": "XPathSelector",
"value": "/text()[2]",
"refinedBy": {
"type": "TextPositionSelector",
"start": 28,
"end": 32
}
}
},
"endSelector": {
"type": "CssSelector",
"value": "p:nth-child(24)",
"refinedBy": {
"type": "XPathSelector",
"value": "/text()[2]",
"refinedBy": {
"type": "TextPositionSelector",
"start": 32,
"end": 88
}
}
}
}
I hope it more clear, at least for me.
The question now is whether we should trust textPositionSelector ?
An another question will be how to import these annotations selector that need to be converted to IRangeInfo
?
currently we can import a Readium annotation set format aka .annotation from both library and reader windows and will be processed in the main process. If the selector cannot be mapped to IRangeInfo "offline" (without DOM mounted), the selector will not be imported to publication annotation list saved in thorium database. So we need an adapter to import any selector and convert it to Dom Range info and then r2-navigator-js IRangeInfo
.
There are some constraints :
The use case to import annotation set in Thorium can be this :
Currently 1, 10, 11, 12 and even 13 is not implemented in develop branch
The most current priority will be the “convert Selector to Range” routine.
selectors highlight demonstration : https://github.com/edrlab/w3c-annotation-selector-demo https://edrlab.github.io/w3c-annotation-selector-demo/web/
I propose a selector that can be mapped to IRangeInfo without DOM context :
{
"type": "RangeSelector",
"startSelector": {
"type": "CssSelector",
"value": "#intro > p:nth-child(2)",
"refinedBy": {
"type": "TextNodeIndexSelector",
"value": 0,
"refinedBy": {
"type": "CodeUnitSelector",
"value": 4
}
}
},
"endSelector": {
"type": "CssSelector",
"value": "#intro > p:nth-child(3)",
"refinedBy": {
"type": "TextNodeIndexSelector",
"value": 2,
"refinedBy": {
"type": "CodeUnitSelector",
"value": 11
}
}
}
}
RangeSelector with a CssSelector and 2 new selectors to find the textNodeIndex from a normalize range and the codeUnit character index position.
can easily be mapped to IRangeInfo :
{
"rangeInfo": {
"endContainerChildTextNodeIndex": 2,
"endContainerElementCssSelector": "#intro > p:nth-child(3)",
"endOffset": 11,
"startContainerChildTextNodeIndex": 0,
"startContainerElementCssSelector": "#intro > p:nth-child(2)",
"startOffset": 4
},
"cleanBefore": " Some text. The ",
"cleanText": "quick brown fox jumps over the lazy dog. The lazy white dog sleeps",
"cleanAfter": " with the crazy fox. Image wit",
"rawBefore": " Some text.\n The ",
"rawText": "quick brown fox jumps over the lazy dog.\n The lazy white dog sleeps",
"rawAfter": " with the crazy fox.\n Image wit"
}
Currently we can export and import an annotations set with the readium Annotation spec but the annotation matching selector is locked with the r2-navigator-js IRangeInfo model. We need an interface to accept/parse any annotation selectors from the w3c annotation spec.
Support of the w3c annotation data model selectors https://www.w3.org/TR/2017/REC-annotation-model-20170223/#selectors :
Need to update the readium annotator spec https://github.com/readium/annotations?tab=readme-ov-file#111-selector to fully support w3c annotation selector model.
FragmentSelector :
TextFragment :
conformsTo
application/xhtml+xml
Fragment Identifier Spec and scroll to text fragment spec https://wicg.github.io/scroll-to-text-fragment/Not supported yet, need to found a strong library to handle this.
audiobook media flags:
CssSelector :
example :
refined by a textPositionSelector inside the node container
Supported on apache-annotator
xPathSelector
ex:
Not supported both in apache-annotator and r2-navigator-js
Need to think how to deal with this selector, and if it will be parsed.
Note: used with the hypothesis client https://github.com/hypothesis/client/blob/main/src/annotator/anchoring/xpath.ts
TextQuoteSelector
ex:
Supported on apache-annotator Do not generate with LCP protection publication : Note from w3c spec :
Implementation with Apache-annotator : https://annotator.apache.org/docs/api/modules/selector.html#textquoteselectormatcher
TextPositionSelector
ex:
apache annotator implementation: https://annotator.apache.org/docs/api/modules/selector.html#textpositionselectormatcher
RangeSelector
ex:
supported on apache-annotator
range to RangeSelector :
Just a POC example, need to test it !
rangeSelector is parsable without DOM content loaded in memory, with just a mapping to the r2-navigator-js IRangeInfo
rangeSelector matched implemented here with apache-annotator usable like other selector.