Open tetron432 opened 5 years ago
I don't know if is still a concern, but I had some difficulties with some peculiar texts as well. I think I pinpointed it to this part
let score: Float = links.reduce(0.0) { $0 + nodes[$1] / outlinks[$1] * weights[$1, node] }
in iteration
within TextRank
, which in my case yielded NaN for some parts, I am guessing due to outlinks[$1]
and/or weights[$1, node]
being 0 (did not debug that deep yet).
My current workaround is to check for NaN and in case replace the score
with 0, i.e.
private func iteration(_ nodes: Node) -> Node {
var vertex = Node()
for (node, links) in graph {
let score: Float = links.reduce(0.0) { $0 + nodes[$1] / outlinks[$1] * weights[$1, node] }
if score.isNaN {
vertex[node] = (1-damping/nodes.count) + damping * 0.0
} else {
vertex[node] = (1-damping/nodes.count) + damping * score
}
}
return vertex
}
It is not the most elegant solution but for the time being it seems to do the trick during some initial testing. If it proves to be working in other cases as well, I could submit a pull request. Perhaps it helps with the other related issues as well.
The library works fine unless its trying to summarize very long articles, such as this: https://www.popsci.com/can-ai-destroy-humanity?utm_source=pocket-newtab