Closed TrevorHansen closed 10 months ago
Oh interesting. I think this is implying that there are a couple heavy hitters in terms of time spent in extraction. The geometric mean reports that this code is about as fast, but in absolute time it's much faster. Any idea what's going on there?
Oh interesting. I think this is implying that there are a couple heavy hitters in terms of time spent in extraction. The geometric mean reports that this code is about as fast, but in absolute time it's much faster. Any idea what's going on there?
Yes, Max, you're right about that. With the 211 inputs, these 4 problems provide about 95% of the time improvement:
file | faster greedy dag extractor (ms) | old extractor with std hashmap (ms) |
---|---|---|
data/tensat/nasrnn_acyclic.json | 312 | 2,560 |
data/tensat/nasneta.json | 1,392 | 15,290 |
data/tensat/nasneta_acyclic.json | 2,602 | 17,613 |
data/tensat/nasrnn.json | 3,163 | 8,904 |
total | 7,469 | 44,367 |
The other 207 problems take 1.3 seconds to be solved by the new extractor and 2.8 seconds for the old extractor. So there are a lot more easy problems than hard problems, which hides the improvement in the geometric average.
Awesome, thanks for the PR! Do you happen to have additional benchmarking data as well? As it is, it's quite hard to actually see the benefit of cost-sharing extraction, since few of the benchmarks seem to have it (or at least not that we have found).
Awesome, thanks for the PR! Do you happen to have additional benchmarking data as well? As it is, it's quite hard to actually see the benefit of cost-sharing extraction, since few of the benchmarks seem to have it (or at least not that we have found).
Yes, it makes sense that you'd like more benchmarks that show the benefit of sharing-aware extraction given that it's so expensive to undertake. I'll make some, but will be slow, probably 6 weeks before I can them ready.
This: (a) speeds up the current greedy extraction, and (b) adds a faster extraction based on the code of @Bastacyclop & @mwillsey
In particular, changing the hashmap library to the rust standard collection reduces the runtime of the greedy extraction from 195 seconds to 47 seconds on my machine using all the test cases in main along with those in the open PRs: https://github.com/TrevorHansen/extraction-gym/tree/main/data
This PR also adds a new extractor that takes about 9 seconds. It's based on the greedy extractor in #8 combined with the simple extractor in main. This doesn't produce the same results though as the current greedy extractor:
I think you get different answers depending on the order in which you visit the nodes.
I'm happy to clean this up as required.