Open vaivaswatha opened 1 year ago
When I worked on #4592, without local_copy_prop_prememcpy
, the test u256_ops_test
wouldn't compile. That was because, the new optimization inserted instructions at the beginning of the block, when the use was actually much later, thus increasing register pressure. With #4628 however, this issue is solved. u256_ops_test
compiles fine now when local_copy_prop_prememcpy
is removed. So we just need to benchmark and if the change is acceptable, remove it.
@IGI-111 I experiemented with removing local_copy_prop_prememcpy
and here the numbers.
My suggestion is to still go ahead removing that optimization pass (because it is badly written, at that time for an immediate urgent need). We can have another tracker issue to catch what patterns of memory copy propagation we're missing and try to incorporate that into the current pass (i.e., the one added in #4592).
Perhaps it would make sense to run benchmarks on the projects from https://github.com/FuelLabs/sway-applications, instead of the Sway test suite?
We can have another tracker issue to catch what patterns of memory copy propagation we're missing and try to incorporate that into the current pass (i.e., the one added in #4592).
I created one (https://github.com/FuelLabs/sway/issues/4879), adding a pattern that I already know we aren't optimization. More can be added as we find them.
Perhaps it would make sense to run benchmarks on the projects from https://github.com/FuelLabs/sway-applications, instead of the Sway test suite?
Ah yes. Let me do that.
Perhaps it would make sense to run benchmarks on the projects from https://github.com/FuelLabs/sway-applications, instead of the Sway test suite?
I gave this a try, but I have trouble building most of the contracts on my local master
build of Sway. Version mismatch b/w std
, core
and sway-libs
.
The optimization pass was written during the IR refactor of #4336, and uses an ad-hoc algorithm. A much more systematic (data-flow based, but not a fully data-flow analysis) algorithm was later implemented in #4592. But that does not seem to cover everything that the ad-hoc algorithm did. This needs to be investigated and the function then removed, ensuring correctness and no penalty in generated code sizes.
The algorithm implemented in #4592, to optimize a sequence of
memcpy
s, does so by optimizing pairs in a loop. This is inefficient, and can probably be done in one go.