Closed kubukoz closed 3 months ago
I think the key is gonna be to precompute a set of recursive shapes from the ClosureMap.
If you can optimise the Map[ShapeId, Set[ShapeId]] => Set[ShapeID]
function, you've essentially won.
You can use the fact that if you traverse several shapes before finding a recursion chain, ALL the shapes of the chain are recursive.
You can also use the fact that once a ShapeId is proven non-recursive, there's no need to traverse it ever again.
So I think an implementation using two mutable sets, one for proven recursive, one for proven-non recursive, would be decent.
Something like :
import scala.collection.mutable.{Set => MSet}
import scala.collection.immutable.ListSet
def recursiveShapes[A](map: Map[A, Set[A]]): Set[A] = {
val provenRecursive: MSet[A] = MSet.empty
val provenNotRecursive: MSet[A] = MSet.empty
def crawl(key: A, seen: ListSet[A]): Unit = {
if (provenNotRecursive(key)) ()
else if (provenRecursive(key)) {
// here we may have found a new "recursive path" between
// elements that we know are recursive
// ps : I'm not sure about this branch, it should probably be dropped
provenRecursive ++= seen.dropWhile(s => !provenRecursive(s))
} else if (seen(key)) {
// dropping elements that don't belong to the "recursive path"
provenRecursive ++= seen.dropWhile(_ != key)
} else
map.get(key) match {
case None => ()
case Some(values) =>
values.foreach(crawl(_, seen + key))
if (!provenRecursive(key)) provenNotRecursive += key
}
}
map.keySet.foreach(crawl(_, ListSet.empty))
provenRecursive.toSet
}
@main()
def test = {
val map = Map[Int, Set[Int]](
1 -> Set(2),
2 -> Set(3, 4, 5),
3 -> Set(4),
5 -> Set(6, 7),
6 -> Set(2),
7 -> Set(2)
)
println(recursiveShapes(map))
}
PS : this need to be tested a lot more thoroughly, obviously
If you want to go ultra optimal, Tarjan is probably the best algorithm : https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
For the AWS example I tried earlier, the implementation you provided was enough to cut it down by 10x ;)
We could try Tarjan too. I'll see when I can get to this, if nobody picks it up in the meantime
thanks for the research!
I think the key is gonna be to precompute a set of recursive shapes from the ClosureMap.
If you can optimise the
Map[ShapeId, Set[ShapeId]] => Set[ShapeID]
function, you've essentially won.You can use the fact that if you traverse several shapes before finding a recursion chain, ALL the shapes of the chain are recursive.
You can also use the fact that once a ShapeId is proven non-recursive, there's no need to traverse it ever again.
So I think an implementation using two mutable sets, one for proven recursive, one for proven-non recursive, would be decent.
that's impressive, and also the pseudo algorithm that you just dropped out of thin air wow
On my machine (M1 Max MBP), loading a relatively large model (15k+ shapes) takes less than a second on the Smithy (aws) side but over 11 seconds on the
DynamicSchemaIndex
side.You can try yourself (with granular steps) with the following scala-cli script:
Here are the timings:
Ideas for improvement
NOTE: some models are not like the others
The characteristics of a model, e.g. how deep vs how wide the shapes are, may have an impact on the timings here. The profiles described below come from distinct models that may or may not have similar depths.
Optimizing recursiveness checks
First, I wanted to show a profiler run on a confidential model with, apparently, deep shape closures.
I ran a profiler during
load dsi
and it turns out that the biggest culprit, taking over 70% of the entireload
call, are recursion checks:Perhaps there's a more efficient way of performing such checks. I haven't seen the code, but
transitiveClosure
may be trying to do too much (i.e. collecting all shapes into a large set, rather than recursing until a knownShapeId
is found).Now here's the flame graph for the AWS model made from the specs in the snippet above:
Zoomed in:
isRecursive
takes up 86% of the whole thing, again. I think if we can optimize that part, we can save a lot of time on loading dynamic models in general.Extra info
I checked
isRecursive
for any outliers and it appears that only a dozen of shapes (out of thousands) take over 500ms to check:which suggests that they're extra deep/wide, but I haven't checked that yet.
I also checked what happens if we rewrite
isRecursive
to be more lazy and return as soon as the current shape is seen (note: didn't check for correctness):It didn't help much:
and neither did caching or a combination of the two. I'm hoping that it's just a mistake on my side and the closure check can still be optimized.