Closed vineetgarc closed 1 year ago
Yeah, my speech about custom Vs mainline. The store merging is done with the help of the mod which I did in upstream, and, probably it will not be in until I don't find another architecture which will benefit from it. LATE EDIT: I'll check if I can make it to work without that hack ;)
The autovectorizer should take care of it.
LMBench memory bandwidth tests frd() and fwr() access consecutive 512 bytes to compute memory subystem bandwidth.
At -O2 the normal (boring) generated code use regular ST instructions (both upstream gcc, GNU 2020.03)
At -Os, gcc from github fork enables store merging, coalescing 2 consecutive word store ST into a single STD double store
This improves Memory Write Bandwidth by over 20%
Back in 2018 Claudiu had pushed a ARC gcc patch to whcih enabled peephole2 patterns for generating LDD/STD [PATCH 4/6] [ARC] Add peephole rules to combine store/loads into double store/loads
However it seems there is one more patch (in generic code) [MAINLINE][HACK] Allow store merging using 64-bit std instructions. which is not merged into upstream and w/o this the peephole doesn't kick in.
So to summarize