Open dibyendumajumdar opened 5 years ago
It looks like the problem is happening in the second pass of localCSE.
The Trees before localCSE are:
n2n BBStart <block_2> (freq 10000) [0x00000247BB27A220] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0
n9n istore <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [0x00000247BB27A3E0] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n5n iconst 1 [0x00000247BB27A2E0] bci=[-1,0,-] rc=1 vc=13 vn=- li=- udi=- nc=0
n14n astore <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A520] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n6n loadaddr <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [0x00000247BB27A320] bci=[-1,0,-] rc=1 vc=13 vn=- li=-1 udi=- nc=0
n19n astore <temp slot 4>[#340 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A660] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n11n loadaddr <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A460] bci=[-1,0,-] rc=1 vc=13 vn=- li=-1 udi=- nc=0
n42n treetop [0x00000247BB27AC20] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n26n aload <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A820] bci=[-1,0,-] rc=2 vc=13 vn=- li=- udi=- nc=0
n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ] [0x00000247BB27A920] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=2
n26n ==>aload
n27n iconst 0 [0x00000247BB27A860] bci=[-1,0,-] rc=1 vc=13 vn=- li=- udi=- nc=0
n35n istore <temp slot 1>[#337 Auto] [flags 0x10003 0x0 ] [0x00000247BB27AA60] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n34n iload <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [0x00000247BB27AA20] bci=[-1,0,-] rc=2 vc=13 vn=- li=- udi=- nc=0
n38n ireturn [0x00000247BB27AB20] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n34n ==>iload
n3n BBEnd </block_2> [0x00000247BB27A260] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0
and after localCSE:
n2n BBStart <block_2> (freq 10000) [0x00000247BB27A220] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0
n9n istore <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [0x00000247BB27A3E0] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n5n iconst 1 [0x00000247BB27A2E0] bci=[-1,0,-] rc=3 vc=16 vn=- li=- udi=- nc=0
n14n astore <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A520] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n6n loadaddr <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [0x00000247BB27A320] bci=[-1,0,-] rc=3 vc=16 vn=- li=- udi=- nc=0
n19n astore <temp slot 4>[#340 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A660] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n11n loadaddr <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [0x00000247BB27A460] bci=[-1,0,-] rc=1 vc=16 vn=- li=- udi=- nc=0
n42n treetop [0x00000247BB27AC20] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n6n ==>loadaddr
n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ] [0x00000247BB27A920] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=2
n6n ==>loadaddr
n27n iconst 0 [0x00000247BB27A860] bci=[-1,0,-] rc=1 vc=16 vn=- li=- udi=- nc=0
n35n istore <temp slot 1>[#337 Auto] [flags 0x10003 0x0 ] [0x00000247BB27AA60] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n5n ==>iconst 1
n38n ireturn [0x00000247BB27AB20] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1
n5n ==>iconst 1
n3n BBEnd </block_2> [0x00000247BB27A260] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=0
@Leonardo2718 thank you for looking at this. The logs above are from Windows 10. Here are logs from RHEL 7.7 - one with optimizer at 'warm' and another at 'hot'.
rhel-trtrace.hot.7812.8508-7812.log rhel-trtrace.warm.7792.8490-7792.log
I think the rhel-trtrace.warm
log is showing roughly the same thing as the log for windows. The rhel-trtrace.hot
log also shows the problem happening during localCSE.
Before:
n1n BBStart <block_2> [ 0x77ec70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0
n9n istore <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [ 0x77ee70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1
n5n iconst 1 [ 0x77ed70] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n14n astore <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [ 0x77efb0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1
n10n loadaddr <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [ 0x77eeb0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n19n astore <temp slot 4>[#340 Auto] [flags 0x10007 0x0 ] [ 0x77f0f0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1
n15n loadaddr <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [ 0x77eff0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n42n treetop [ 0x77f6b0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n23n aload <temp slot 4>[#340 Auto] [flags 0x10007 0x0 ] [ 0x77f1f0] bci=[-1,0,-] rc=2 vc=4 vn=- li=- udi=- nc=0
n43n treetop [ 0x77f6f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n26n aloadi <array-shadow>[#236 Shadow] [flags 0x80000607 0x0 ] [ 0x77f2b0] bci=[-1,0,-] rc=2 vc=4 vn=- li=- udi=- nc=1
n23n ==>aload
n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ] [ 0x77f3b0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=2
n26n ==>aloadi
n27n iconst 0 [ 0x77f2f0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n35n istore <temp slot 1>[#337 Auto] [flags 0x10003 0x0 ] [ 0x77f4f0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1
n34n iload <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [ 0x77f4b0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n38n ireturn [ 0x77f5b0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1
n37n iload <temp slot 1>[#337 Auto] [flags 0x10003 0x0 ] [ 0x77f570] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0
n4n BBEnd </block_2> [ 0x77ed30] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0
After localCSE:
n1n BBStart <block_2> [ 0x77ec70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0
n9n istore <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [ 0x77ee70] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n iconst 1 [ 0x77ed70] bci=[-1,0,-] rc=3 vc=7 vn=- li=- udi=- nc=0
n14n astore <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [ 0x77efb0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n10n loadaddr <temp slot 2>[#338 Auto] [flags 0x10003 0x0 ] [ 0x77eeb0] bci=[-1,0,-] rc=1 vc=7 vn=- li=- udi=- nc=0
n19n astore <temp slot 4>[#340 Auto] [flags 0x10007 0x0 ] [ 0x77f0f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n15n loadaddr <temp slot 3>[#339 Auto] [flags 0x10007 0x0 ] [ 0x77eff0] bci=[-1,0,-] rc=3 vc=7 vn=- li=- udi=- nc=0
n42n treetop [ 0x77f6b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n15n ==>loadaddr
n43n treetop [ 0x77f6f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n26n aloadi <array-shadow>[#236 Shadow] [flags 0x80000607 0x0 ] [ 0x77f2b0] bci=[-1,0,-] rc=2 vc=7 vn=- li=- udi=- nc=1
n15n ==>loadaddr
n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ] [ 0x77f3b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=2
n26n ==>aloadi
n27n iconst 0 [ 0x77f2f0] bci=[-1,0,-] rc=1 vc=7 vn=- li=- udi=- nc=0
n35n istore <temp slot 1>[#337 Auto] [flags 0x10003 0x0 ] [ 0x77f4f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n ==>iconst 1
n38n ireturn [ 0x77f5b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n ==>iconst 1
n4n BBEnd </block_2> [ 0x77ed30] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=0
The way this ought to have worked is via aliasing on the sym ref 232. The store to that sym ref would be analyzed by local CSE as part of it's normal processing and if the sym ref 338 is aliased with sym ref 232, then the local copy propagation done by local CSE would not happen.
hi @vijaysun-omr, It would be good to know if this is a bug.
Hi @dibyendumajumdar, I cannot see the aliasing information in the log file attached. Maybe generating the same log with traceAliases added to your command line options would help confirm this is the problem. Not having the right aliasing will trip up optimizations and the bug should be fixed in the aliasing infrastructure rather than in the optimizer. The OMR aliasing infrastructure may not be taking into account "address taken" autos etc. (as you probably know, it is not possible to take the address of an auto in Java which is where the OMR compiler gets it's heritage). So, the aliasing infrastructure would need to be enhanced to alias "address taken" autos in some way, e.g. a very simple technique could be track all the sym refs that are accessed using a loadaddr opcode (gets the address of the memory location represented by the sym ref) and conservatively, alias them with all shadows.
Okay thank you, I will regenerate the trace as suggested.
Here is trace generated with following flags: TR_Options=traceIlGen,traceFull,traceAliases,log=trtrace.log
I think appropriate aliasing info is not being generated. I wonder if the 'address taken' autos data introduced in https://github.com/eclipse/omr/pull/2823 needs to be also considered in other scenarios. trtrace.log.11485.12999-11485.log
I did an experiment.
I changed the following in TR_BitVector * OMR::SymbolReference::getUseDefAliasesBV(bool isDirectCall, bool includeGCSafePoint)
:
default:
//TR_ASSERT(0, "getUseDefAliasing called for non method");
if (comp->generateArraylets() && comp->getSymRefTab()->aliasBuilder.gcSafePointSymRefNumbers().get(self()->getReferenceNumber()) && includeGCSafePoint)
return &comp->getSymRefTab()->aliasBuilder.gcSafePointSymRefNumbers();
else
{
if (kind == TR::Symbol::IsAutomatic && !symRefTab->aliasBuilder.addressTakenAutos().isEmpty())
{
TR_BitVector * aliases = new (aliasRegion) TR_BitVector(bvInitialSize, aliasRegion, growability);
*aliases |= symRefTab->aliasBuilder.addressTakenAutos();
return aliases;
}
}
return 0;
}
This is probably not correct as it will return the address taken autos bitmap for any 'automatic' variable, but it seems to do the trick. Attached is the log.
trtrace.log.14161.33823-14161.log
Hence it is probably necessary to make a change of this kind.
However above experimental change does not help in a few other tests where I am getting bad code when more optimizations are enabled.
In particular, say I have a local var, and this is accessed via double indirection, then can the aliasing infrastructure handle the 'transitive' aliasing? That is, pointer 1 aliases local var, and pointer 2 takes address of pointer1 hence it also aliases local var.
I think you are on the right track in terms of the part of JIT code to change.
I feel it will be more correct if you ORed in the symRefTab->aliasBuilder.addressTakenAutos() before we return the alias set for any shadow sym ref. The converse of that is that when asked for the alias information for any "address taken" auto, you ORed in all the shadow sym refs. This can maybe be improved upon in the future to be less conservative but to get your tests passing from a functional standpoint, you could do these two changes.
For context, any indirect access would be done via a shadow sym ref and if we set up the aliasing relationship between it and address taken autos as I described, I believe it would handle the transitive aliasing case you mentioned as well.
@vijaysun-omr Okay thank you. Presumably that is under section?:
case TR::Symbol::IsShadow:
@dibyendumajumdar yes for the shadow sym refs, it is under that section that you showed. The second change I mentioned would go under the auto sym refs section that you were changing before.
To elaborate on the rationale for the earlier suggestion, the aliasing api that you are changing for "use def aliases" can be intuitively looked at as a way of asking the question : "If there was a store to the sym ref that we are requesting use def aliases for, loads of which sym refs MAY be affected by the store, i.e. which loads of which sym refs may have their value changed by the store (meaning we cannot move those loads past the store in any way)." This hopefully clarifies the need for the "symmetric aliasing" I was suggesting earlier. If you have an auto a and it's address has been taken like p = &a; then a store to auto a affects the value of a load p (will use a shadow sym ref in the OMR compiler) and equally a store to p (again, will use a shadow sym ref in the OMR compiler) affects the value of a load of auto a. So we need the 2 sym refs involved to both be returning the other sym ref in their "use def aliases" set.
@vijaysun-omr thanks for the explanation. I will revert back later after I have tried this.
Hi @vijaysun-omr,
I had a look at this but didn't have any success. I do not know if I am setting up the alias data correctly in the first place. Secondly when the method I referred to is called, all calls appear to be for kind == 0 (i.e. automatic), so the shadow case is never even executed. So not sure if the problem is even before this method gets called (i.e. in my initial alias data setup).
That is to say, I am not sure what calls to make when I am creating an array load/store instruction. I have tried various things but it is like stabbing in the dark, without knowing what I should really be doing. Intuitively a stack variable is at a certain location, and I guess that we need to be able to say that the array load/store overlaps that location. However the way to so this is not clear.
Just to give you some idea here is the bit of code where I generate the IL. As you will see I have tried various options at one time or another.
TR::SymbolReference* findOrCreateShadowSymRef(
/*TR::SymbolReference *typeSymbol, */ uint64_t typeSymbol, TR::DataType type, int32_t offset)
{
// fprintf(stderr, "Searching for type %lld offset %d\n", typeSymbol, offset);
for (ShadowSymInfo* syminfo = shadow_symbols_; syminfo != nullptr; syminfo = syminfo->next) {
if (syminfo->typeSymbol == typeSymbol && syminfo->type == type && syminfo->offset == offset) {
// fprintf(stderr, "Found shadow %p\n", syminfo->shadowSymbol);
return syminfo->shadowSymbol;
}
}
TR::Symbol* sym = TR::Symbol::createShadow(_comp->trHeapMemory(), type, TR::DataType::getSize(type));
TR::SymbolReference* symRef = new (_comp->trHeapMemory())
TR::SymbolReference(_comp->getSymRefTab(), sym, _comp->getMethodSymbol()->getResolvedMethodIndex(), -1);
symRef->setOffset(offset);
// conservative aliasing
int32_t refNum = symRef->getReferenceNumber();
if (type == TR::Address)
_comp->getSymRefTab()->aliasBuilder.addressShadowSymRefs().set(refNum);
else if (type == TR::Int32)
_comp->getSymRefTab()->aliasBuilder.intShadowSymRefs().set(refNum);
else
_comp->getSymRefTab()->aliasBuilder.nonIntPrimitiveShadowSymRefs().set(refNum);
ShadowSymInfo* info = new (_comp->trHeapMemory()) ShadowSymInfo();
info->type = type;
info->offset = offset;
info->shadowSymbol = symRef;
info->typeSymbol = typeSymbol;
info->next = shadow_symbols_;
shadow_symbols_ = info;
// fprintf(stderr, "Created shadow %p\n", info->shadowSymbol);
return info->shadowSymbol;
}
void JIT_ArrayStore(JIT_ILInjectorRef ilinjector, JIT_NodeRef basenode, JIT_NodeRef indexnode, JIT_NodeRef valuenode)
{
auto injector = unwrap_ilinjector(ilinjector);
auto base = unwrap_node(basenode);
auto index = unwrap_node(indexnode);
auto value = unwrap_node(valuenode);
auto type = value->getDataType();
TR::ILOpCodes storeOp = injector->comp()->il.opCodeForIndirectArrayStore(type);
auto array_offset = get_array_element_address(injector, type, base, index);
#if 1
TR::SymbolReference* symRef = injector->symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);
TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, array_offset, value, 0, symRef);
#else
TR::Symbol* sym = TR::Symbol::createShadow(injector->comp()->trHeapMemory(), type, TR::DataType::getSize(type));
TR::SymbolReference* symRef = new (injector->comp()->trHeapMemory()) TR::SymbolReference(
injector->comp()->getSymRefTab(), sym, injector->comp()->getMethodSymbol()->getResolvedMethodIndex(), -1);
symRef->setReallySharesSymbol();
// conservative aliasing
int32_t refNum = symRef->getReferenceNumber();
if (type == TR::Address)
injector->comp()->getSymRefTab()->aliasBuilder.addressShadowSymRefs().set(refNum);
else if (type == TR::Int32)
injector->comp()->getSymRefTab()->aliasBuilder.intShadowSymRefs().set(refNum);
else
injector->comp()->getSymRefTab()->aliasBuilder.nonIntPrimitiveShadowSymRefs().set(refNum);
TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, array_offset, value, 0, symRef);
#endif
injector->genTreeTop(store);
}
void JIT_ArrayStoreAt(
JIT_ILInjectorRef ilinjector, uint64_t symbolId, JIT_NodeRef basenode, int64_t idx, JIT_NodeRef valuenode)
{
auto injector = unwrap_ilinjector(ilinjector);
auto base = unwrap_node(basenode);
auto index = TR::Node::lconst(idx);
auto value = unwrap_node(valuenode);
auto type = value->getDataType();
TR::SymbolReference* symRef = nullptr;
TR::ILOpCodes storeOp = injector->comp()->il.opCodeForIndirectArrayStore(type);
auto aoffset = get_array_element_address(injector, type, base, index);
#if 0
if (symbolId) {
symRef = injector->findOrCreateShadowSymRef(symbolId, type, (int32_t)idx);
}
#endif
if (!symRef)
symRef = injector->symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);
TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, aoffset, value, 0, symRef);
injector->genTreeTop(store);
}
One problem I have is that my front-end IR generates array based load/store instructions even for primitive types, treating the stack as an array. I do not therefore want to cause poor optimization with unnecessary aliasing information
@dibyendumajumdar the optimization that seems to cause the wrong transformation is local common subexpression elimination (from looking at one of the first logs attached to this issue):
[ 37] O^O LOCAL COMMON SUBEXPRESSION ELIMINATION: Local Common Subexpression Elimination propagating local #338 in node : 00000247BB27AA20 PARENT : 00000247BB27AA60 from node 00000247BB27A3E0
O^O LOCAL COMMON SUBEXPRESSION ELIMINATION: Rhs of store def node : 00000247BB27A2E0
The local CSE optimization incorrectly believes that the store of auto sym ref 338 can be propagated to the load of that auto sym ref later in the same block. The reason this is incorrect is that there is a store to an array shadow sym ref in between that store and load of 338 that essentially is storing to the same location (given that the sym ref 338 was one that had it's "address taken").
The way this propagation is meant to be stopped is via aliasing as we've been discussing and in particular by the code guarded by the boolean set on the following line
when this line is executed with node
being the store into the array shadow sym ref (i.e. in the log I am looking at):
n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ]
then hasAliases
should be true.
I thought I'd suggest checking if we are setting hasAliases
to true for the store node above at the line number I pasted a link to since you mentioned not reaching the code you were modifying for shadow sym refs at all in your last comment. hasAliases
has to be true I expect for the store node above to engage the code in local CSE to try to do the right thing.
Hi @vijaysun-omr Thank you for the analysis. I am certainly not setting hasAliases
in my code. I will have a look at that (because I work off an IR generated by the C front-end it is sometimes hard to have full control).
To be 100% clear, I meant investigating what value you see for the hasAliases
boolean in that local CSE line I mentioned for all the nodes as they are being processed by local CSE. Assuming it's false, you would then need to investigate how hasAliases
was computed incorrectly for your front end by debugging into the node API that it seems to depend on (e.g. node->mayKill
).
Okay thank you. I started investigating why the array shadow load/store do not generate any aliasing. The problem might be this:
symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);
I am trying the other one which takes a size argument, and now at least I see:
Symref #232 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 000000AD4AF4DA38 {232, 239}
Symref #236 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 000000AD4AF4DA88 {229, 236}
I also see the shadow section being called. I have to next try ORing the address taken bitmap.
So after ORing addressTakenAutos
with arrayElementSymRefs
I get:
Symref #232 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDDB8 {232, 239}
Symref #236 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDE08 {229, 236}
Symref #337 <temp slot 1>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDE58 {232, 236, 338, 339}
Symref #338 <temp slot 2>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDEA8 {232, 236, 338, 339}
Symref #339 <temp slot 3>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDEF8 {232, 236, 338, 339}
Symref #340 <temp slot 4>
Use Aliases: NULL
Usedef Aliases: 000000ED620FDF48 {232, 236, 338, 339}
This looks more promising. However some tests still fail.
Next version of the original test:
Symbol References with Aliases:
Symref #232 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD7A8 {232, 239, 338, 339}
Symref #236 <array-shadow>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD7F8 {229, 236, 338, 339}
Symref #337 <temp slot 1>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD848 {232, 236, 338, 339}
Symref #338 <temp slot 2>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD898 {232, 236, 338, 339}
Symref #339 <temp slot 3>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD8E8 {232, 236, 338, 339}
Symref #340 <temp slot 4>
Use Aliases: NULL
Usedef Aliases: 0000006319AFD938 {232, 236, 338, 339}
However even now other tests fail.
Example of failing test case:
void typepun (float *ptr2)
{
*ptr2=0;
}
int main(void)
{
int val;
int *ptr = &val;
float *ptr2 = (float*)&val;
*ptr=1;
typepun (ptr2);
if (*ptr)
return 1;
return 0;
}
This involves function call - so probably indicates my PR or method aliasing address taken vars is not yet fully effective.
@dibyendumajumdar I looked at this last log you uploaded and find the problematic transformation is:
[ 63] O^O VALUE PROPAGATION: Constant folding iload [0x000001B03B202B60] to iconst 1
To me this suggests that use-def info is not taking into account the fact that a call might be aliasing an auto. A log with traceUseDefs
added to your options would help in confirming.
I found this comment about an argument to be passed to UseDefInfo's constructor:
* when local (stack) variable has been aliased by a function call.
* A value of false is fine for Java like languages where
* local (stack) variables cannot be passed by reference to function calls
* and hence cannot be aliased. However for C like languages this flag should be
* set to true.
*/
TR_UseDefInfo::TR_UseDefInfo(TR::Compilation *comp, TR::CFG *cfg, TR::Optimizer *optimizer,
bool requiresGlobals, bool prefersGlobals, bool loadsShouldBeDefs, bool cannotOmitTrivialDefs, bool conversionRegsOnly,
bool doCompletion, bool callsShouldBeUses)
This suggests that callsShouldBeUses
should be set to true in your case but I think the real issue may well be that we need some way of expressing that calls could be defs for autos that have had their address taken. Your call aliasing does seem to be correct, so we'd have to look at UseDefInfo.cpp to check how to make it work in your case.
Hi @vijaysun-omr
I thought I was setting callsShouldBeUses
to true in the NJ Optimizer subclass, but perhaps I didn't as I don't see it committed here https://github.com/dibyendumajumdar/nj/blob/master/frontends/nj/optimizer/NJOptimizer.hpp. So it may well explain the failure. I will revert back after checking this.
Update: actually NJ doesn't have the latest changes so it is being hardcoded to 'true' here: https://github.com/dibyendumajumdar/nj/blob/master/compiler/optimizer/UseDefInfo.cpp
Here is the trace with traceUseDefs
.
@dibyendumajumdar the last trace you uploaded has different IL generated (more complex) than the prior trace; just wondering if this is intentional. The prior log was easier to work with since it had a call that was fairly simple and a use of the address taken auto later:
n33n call file:line:typepun[#342 computed-static Method] [flags 0x500 0x0 ] [0x000001B03B202960] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=5 nc=1
n16n ==>loadaddr
n43n ificmpne --> block_4 BBStart at n6n () [0x000001B03B202BE0] bci=[-1,0,-] rc=0 vc=44 vn=- li=- udi=- nc=2 flg=0x20
n41n iload <temp slot 3>[#339 Auto] [flags 0x10003 0x0 ] [0x000001B03B202B60] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=- nc=0
n42n iconst 0
whereas the new trace has the call looking like:
n56n icall file:line:f[#351 computed-static Method] [flags 0x500 0x0 ] () [0x000002604A51C1E0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=11 nc=5 flg=0x20
n51n loadaddr <temp slot 7>[#343 Auto] [flags 0x10007 0x0 ] [0x000002604A51C0A0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=15 nc=0
n52n loadaddr <temp slot 3>[#339 Auto] [flags 0x10003 0x0 ] [0x000002604A51C0E0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=16 nc=0
n53n loadaddr <temp slot 4>[#340 Auto] [flags 0x10003 0x0 ] [0x000002604A51C120] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=17 nc=0
n54n loadaddr <temp slot 8>[#344 Auto] [flags 0x10007 0x0 ] [0x000002604A51C160] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=18 nc=0
n55n loadaddr <temp slot 9>[#345 Auto] [flags 0x10007 0x0 ]
with a comparison tree that is quite complex following it.
@vijaysun-omr I will check if I ran the correct test
My bad - it was another test that is also failing. Here is the correct log fle. trtrace.log.5288.43750.log
Use #5[00000232F6752B60] is defined by:
Def #1[00000232F6752820]
shows the relevant load's def information and it corresponds to the store before the call, i.e. the call node was not considered a def even though it aliases it. This could mean some changes to UseDefInfo.cpp are needed to make this test pass since your passing in of the boolean to consider calls as uses was also not enough.
@vijaysun-omr Okay thank you for the analysis.
I guess that even if we solved this particular case another one might crop up to do with this type of operation. I am wondering if it is worth pursuing - how likely is that OMR will be used for a C like language? For my C front-end, I can simply disallow taking addresses of local stack variables; I do not need this capability for the main usage which is in Ravi, where I carefully avoid this scenario.
If however, it is felt that OMR should support languages like C, then it would be worth spending the effort. I would like to work on it but it will probably take me a while to figure out. On the plus side as a by-product it may be worthwhile documenting the way aliasing data should be setup by the front-end and how that data is then used in various scenarios.
Appreciate your help so far, and interested to know what you and the wider team think about above.
I agree with your last comment. The change to use-def info itself to resolve the issue we see in your current failing test(s) may be a small to medium (trending to latter for someone who does'nt know the code) effort. It's not so large an effort to dismiss out of hand but also not so trivial as to say we should just fix it and continue your testing. Your comment suggests you would be able to make progress by disallowing taking the address in your C-front and so a fix is not necessarily blocking you.
In the end, it probably depends on the wider question you raised around OMR use for C-like languages (that others can also express an opinion on). At the very least we should probably document this "known limitation" for C-like languages and add an issue to the backlog.
Oh, and if you wanted to attempt fixing this in use-def info as an ongoing exercise, I'd be happy to help you with that effort via suggestions, looking at logs and answering questions (as I did in this issue) :)
@vijaysun-omr It is definitely not blocking me and I would not want to take up your time with fixing an issue that is not affecting the wider OMR user base. I am happy to work on it in the background. Thank you for the help, but please treat it as non-urgent.
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment on the issue or it will be closed in 60 days.
Hi,
I am trying to compile following test code using my C front-end.
I am getting incorrect results when optimizations are on. Attached are two traces - one without optimizations and other with some enabled. As far as I can tell the initial generated IL is correct.
trtrace.log.1800.13116.log trtrace.log.10724.13120.log