Initial placement-aware scan stitching

gatecat commented 1 week ago

This isn't quite complete as I haven't updated tests yet, but I'd appreciate feedback on the direction here before I do that.

This PR makes two changes to the DFT code:

insert_dft is now only performing scan cell replacement, a new command connect_dft is used to perform scan architect and stitch. This is because replacement needs to be done before placement (as it affects cell areas) but stitching needs to be done after placement (where it can use placement information to minimise scan wirelength)
A basic wirelength minimising sort is used on the scan chain before stitching. This is a simple greedy nearest neighbour approach using Boost rtrees - I also briefly explored using the or-tools TSP solver a bit but all the modes I tried were considerably too slow (and the only way to improve this seemed to be by setting a timeout, which is obviously a no-go due to nondeterminism).

After this PR, I plan to work on the scan chain connectivity, so it's easier to create scan ports ahead of time or connect the scan chain to existing signals (like top level IO pad cells, if not using a macro based flow). Any feedback on what might be wanted here is appreciated, too.

maliberty commented 1 week ago

@fgaray would you please comment

fgaray commented 1 week ago

Thanks for working on this! Some high level comments:

insert_dft is now only performing scan cell replacement, a new command connect_dft is used to perform scan architect and stitch. This is because replacement needs to be done before placement (as it affects cell areas) but stitching needs to be done after placement (where it can use placement information to minimise scan wirelength)

I think this is a good idea but I would propose a change to be more "standard". I would create a standalone command called "scan_replace" to perform the scan replacement and leave architect and stitching to insert_dft.

Similar to other tools, scan_replace is usually performed in a command like "compile -scan" and insert_dft performs architect and stitching.

Other advantage is of extracting scan replace out of insert_dft/preview_dft is that we can simplify the preview_dft implementation, that is, we don't need to perform a temporal scan replace in preview_dft.

A basic wirelength minimising sort is used on the scan chain before stitching. This is a simple greedy nearest neighbour approach using Boost rtrees - I also briefly explored using the or-tools TSP solver a bit but all the modes I tried were considerably too slow (and the only way to improve this seemed to be by setting a timeout, which is obviously a no-go due to nondeterminism).

That's my experience too. Besides what you tried, I would see if we can get some kind of generic heuristic inside OpenROAD, a genetic algorithm would be a good thing to try. Also, I would not try to optimize that much the result of insert_dft as we can always do a second optimization stage later in the flow once we have more info about where the cells are really going to be placed.

I will review the code itself now.

fgaray commented 1 week ago

I guess if you go with my suggestion the code will change. Let me know once this is ready for another look or if my suggestions do not make sense. Thanks!

github-actions[bot] commented 1 week ago

clang-tidy review says "All clean, LGTM! :+1:"

gatecat commented 1 week ago

@fgaray Thanks for your feedback, I've implemented your recommendations and I think it's ready for another review.

github-actions[bot] commented 1 week ago

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] commented 1 week ago

clang-tidy review says "All clean, LGTM! :+1:"

gatecat commented 6 days ago

I'll get back to your other comments in a bit, but just as an update - I've been looking at the benchmarking and the good news is that with these changes, there does appear to be a noticeable drop in routed wirelength - from 2260771um (HEAD) to 1881677um (scan). The reordering doesn't seem to be a significant part of runtime, as far as I can see.

However, weirdly, which is what I'm currently trying to investigate (it could be that I messed something up with my checks), is that the with the reordering, antenna diode insertion takes a ridiculously long amount of time, increasing the overall runtime by more than double - even though with the reordered chains (and therefore less wirelength), there are fewer diodes to insert.

gatecat commented 6 days ago

Ah, found why, I was running DFT insertion in the reordering case after all placement for the reordering, which meant that the newly inserted scan enable signal wasn't getting buffered by repair. Moving it to before repair means the runtime is essentially the same as before, maybe 30 seconds faster (10 minutes +/- noise in each case). The wirelength comparison above doesn't change significantly as a result either.

github-actions[bot] commented 6 days ago

clang-tidy review says "All clean, LGTM! :+1:"

maliberty commented 1 day ago

@fgaray @gatecat any more concerns before final review/merge?

fgaray commented 1 day ago

The code LG Thanks for working on it!, I will leave this bit of argument to you @maliberty :

@gatecat says that in other parts of the code we return a copy (and it is only two ints) but I think we should return Point as a reference. I will leave the final decision to you (@maliberty ) if you think we should return a reference here or use the semantics of the rest of the code.

One last bit, could it be possible to add a test case with physical information?, the current test cases are not really testing your code as they are not doing global placement, is it possible to include such case? (or modify one or all the test cases that we have in dft/test).

That's all from my side!, thanks!

maliberty commented 1 day ago

godbolt says it doesn't matter whether you return by reference or by value so the current code is fine (with #5279)

maliberty commented 1 day ago

I agree about a physical test cast. I don't think it is necessary to run global placement but I do think something that shows the scan ordering being driven by WL would be good.

gatecat commented 1 day ago

Thanks for the review comments. I've addressed them, and added a test that manually places 10 cells and checks they are sorted according to the placement.

github-actions[bot] commented 1 day ago

clang-tidy review says "All clean, LGTM! :+1:"

maliberty commented 1 day ago

In the future please avoid using a force push after review as it prevents use of the GH feature to see changes since the last review.

maliberty commented 1 day ago

Thanks @gatecat !

The-OpenROAD-Project / OpenROAD

Initial placement-aware scan stitching #5246