Closed appledora closed 2 years ago
In GitLab by @geohci on Aug 24, 2022, 01:13
@appledora I think we can close this with the introduction of the depth-first search approach? Or do you want to leave open for now?
We can close this, I think. Right now we are not facing any troubles with transclusion detection.
Some transclusions links, don't have any marker for us to identify them and have the following format (the same as a standard WikiLink) :
<a href="./Dictionary_of_National_Biography" rel="mw:WikiLink" title="Dictionary of National Biography"> Dictionary of National Biography </a>
Corresponding article : William Clark
In the same article,
<a class="mw-disambig" href="./William_Clark_(disambiguation)" rel="mw:WikiLink" title="William Clark (disambiguation)"> William Clark (disambiguation) </a>
- is both a disambiguation and a transclusion. The class attributemw-disambig
helps us identify the disambiguation, but not the transclusion.In a closer inspection, it seems we need to look at the context in which the link is placed. i.e:
For the same element, if we consider it's parent
div
tag, we see that it has arole = "note"
andclass=hatnote
. This is preceded by a style-tag which likely performs the actual transclusion of the item.For reference, check thread : https://gitlab.wikimedia.org/repos/research/html-dumps/-/merge_requests/8#note_9522