cpc / openasip

Open Application-Specific Instruction Set processor tools (OpenASIP)
http://openasip.org
Other
138 stars 41 forks source link

Use unordered_map for source/sink distances. #255

Closed NoamK-CR closed 4 months ago

NoamK-CR commented 4 months ago

In my use-case the source/sink distance update is a bottleneck. I think this will improve performance for other workloads as well, but haven't benchmarked extensively.

pjaaskel commented 4 months ago

The reason why we've used map here instead of an unodered map is that we aim for the compilation results to be deterministic. I'm not sure if they are nowadays still, but we should aim for that. IIRC, if the graph nodes during instruction scheduling can be ordered semi-randomly, it leads to different instruction schedules for each compiler run even with the same inputs. Can you make the container a template parameter which defaults to the ordered map and instantiate with unordered_map in your case? I'm curious: Can you tell something about your use case?

NoamK-CR commented 4 months ago

Unfortunately, I can't say much about my use case besides that we inline and unroll loops a lot, leading to very large basic blocks (each function is actually a single basic block).

I absolutely agree that compilation should be deterministic (and haven't seen any undeterminism myself). These maps are never iterated over, though, they are only used to track distances. I don't think this change can cause any undeterminism.

Specifically, I didn't modify the definition of NodeSet on line 86, which is part of the public API.

pjaaskel commented 4 months ago

OK, cool. If you checked that it's not interated over then this should fine. I cannot remember anymore which was the cause of undetermism which was fixed with ordered maps as it was quite long time ago. To help me sleep better, could you add a simple test for deterministic results in the scheduler systemtest suite, for example just compile a simple hello world twice and compare the returned TPEF?

karihepola commented 4 months ago

Unfortunately, I can't say much about my use case besides that we inline and unroll loops a lot, leading to very large basic blocks (each function is actually a single basic block).

I absolutely agree that compilation should be deterministic (and haven't seen any undeterminism myself). These maps are never iterated over, though, they are only used to track distances. I don't think this change can cause any undeterminism.

Specifically, I didn't modify the definition of NodeSet on line 86, which is part of the public API.

The scheduler has some regions with asymptotic complexity of at least n². Basic block splitting/reduction of the scheduling window can reduce the compilation time significantly. This can, however, make the compiled program slower, but often the speedup is negligible after increasing the basic block size beyond a certain point, especially for narrow architectures.

NoamK-CR commented 4 months ago

... could you add a simple test for deterministic results in the scheduler systemtest suite, for example just compile a simple hello world twice and compare the returned TPEF?

Sure, I'm getting familiar with the testing infrastructure, and should be able to push a test in the next few days.

The scheduler has some regions with asymptotic complexity of at least n². Basic block splitting/reduction of the scheduling window can reduce the compilation time significantly. This can, however, make the compiled program slower, but often the speedup is negligible after increasing the basic block size beyond a certain point, especially for narrow architectures.

I noticed those complexity issues, and may have a workaround for my use case. I'm also looking into splitting the basic blocks, but I'm afraid it will prevent register bypasses that cross block boundaries. Thanks for the comment, though.

pjaaskel commented 4 months ago

Thanks!