Open vpeil opened 3 years ago
Hi!
You mean that you want to restrict a search to concepts that are descendant from a certain top concept? I think we had other features in mind that have the same premise (looking at a particular subtree only, e.g. gbv/jskos-metrics#9), so it's certainly worth looking into it. I'm wondering how we could implement this efficiently. Maybe @nichtich has an idea?
We could generate and index the ancestors
field and allow to filter with query parameter ancestor={uri}
. This only makes sense for mono-hierarchical vocabularies or cases where there the selected ancestor to filter with is reachable via all broader-pathes - but we don't need to check this. Adding ancestors to the database could be tricky for arbitrary concept updates because an updated concept might modify ancestor chains anywhere.
Maybe MongoDB graphLookup can help. The field to build the graph from is broader[0].uri
.
Yes, my use case in monohierarchical.
I will have a look at the graphLookup of MongoDB. I will post my findings here in any case, but this will take some time....
$graphLookup can definitely be used to implement this, but I'm not sure if it's possible to do it efficiently, i.e. without having to go through the whole Concepts collection.
I played around with $graphLookup a little bit (also because it might be useful for a different issue) and found something that could work, however only in a restricted fashion:
db.getCollection('concepts').aggregate([
{
$match: { uri: "http://rvk.uni-regensburg.de/nt/A" }
},
{
$graphLookup: {
from: "concepts",
startWith: "$uri",
connectFromField: "uri",
connectToField: "broader.uri",
as: "descendant",
restrictSearchWithMatch: {
_keywordsLabels: { $regex: "^BIB" }
}
}
},
{
$unwind: "$descendant"
},
{
$replaceRoot: { newRoot: "$descendant" }
}
])
So we match only the desired parent concept (doesn't have to be a top concept), then we do a graph lookup like @nichtich described, but in reverse (matching from uri
to broader.uri
, and use the restrictSearchWithMatch
to specify the search conditions. Then we unwind and replace the root.
Why did I say "restricted fashion"? The problem is that restrictSearchWithMatch
doesn't seem to work with text indexes, and the query needs to be restrictive enough that the results can fit in memory. For reasons I don't fully understand, MongoDB has to load ALL results into memory first even if we only want a subset (e.g. the first 100). So the above example without restrictSearchWithMatch
will not fit into memory, for example. I don't see a technical reason for this, either this use case is not common enough that MongoDB can't do it, or I'm missing something.
I'm mostly writing this down to document my findings. I still haven't fully grasped $lookup
and $graphLookup
and keep expecting them to do things they apparently cannot do. As mentioned somewhere else, sometimes I think a relational database would have been a better choice.
(@nichtich's first solution, i.e. generating and indexing an ancestors
field, would still work and be very performant because it could use an index. The downside is, as always with these things, storage space. Having ancestors in the database for every concept takes up quite a lot of space.)
This may be beyond the scope of this project, but would be very useful. I would like to filter search results by top concepts.
Any idea, how this could be achieved?