eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
940 stars 394 forks source link

Potential optimization bug to do with updating local variable indirectly #4487

Open dibyendumajumdar opened 4 years ago

dibyendumajumdar commented 4 years ago

Hi,

I am trying to compile following test code using my C front-end.

int
main()
{
        int   x;
        int  *p;
        int **pp;

        x = 1;
        p = &x;
        pp = &p;
        **pp = 0;
        return x;
}

I am getting incorrect results when optimizations are on. Attached are two traces - one without optimizations and other with some enabled. As far as I can tell the initial generated IL is correct.

trtrace.log.1800.13116.log trtrace.log.10724.13120.log

Leonardo2718 commented 4 years ago

It looks like the problem is happening in the second pass of localCSE.

The Trees before localCSE are:

n2n       BBStart <block_2> (freq 10000)                                                      [0x00000247BB27A220] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0
n9n       istore  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                              [0x00000247BB27A3E0] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n5n         iconst 1                                                                          [0x00000247BB27A2E0] bci=[-1,0,-] rc=1 vc=13 vn=- li=- udi=- nc=0
n14n      astore  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                              [0x00000247BB27A520] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n6n         loadaddr  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                          [0x00000247BB27A320] bci=[-1,0,-] rc=1 vc=13 vn=- li=-1 udi=- nc=0
n19n      astore  <temp slot 4>[#340  Auto] [flags 0x10007 0x0 ]                              [0x00000247BB27A660] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n11n        loadaddr  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                          [0x00000247BB27A460] bci=[-1,0,-] rc=1 vc=13 vn=- li=-1 udi=- nc=0
n42n      treetop                                                                             [0x00000247BB27AC20] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n26n        aload  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                             [0x00000247BB27A820] bci=[-1,0,-] rc=2 vc=13 vn=- li=- udi=- nc=0
n30n      istorei  <array-shadow>[#232  Shadow] [flags 0x80000603 0x0 ]                       [0x00000247BB27A920] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=2
n26n        ==>aload
n27n        iconst 0                                                                          [0x00000247BB27A860] bci=[-1,0,-] rc=1 vc=13 vn=- li=- udi=- nc=0
n35n      istore  <temp slot 1>[#337  Auto] [flags 0x10003 0x0 ]                              [0x00000247BB27AA60] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n34n        iload  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                             [0x00000247BB27AA20] bci=[-1,0,-] rc=2 vc=13 vn=- li=- udi=- nc=0
n38n      ireturn                                                                             [0x00000247BB27AB20] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=1
n34n        ==>iload
n3n       BBEnd </block_2>                                                                    [0x00000247BB27A260] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0

and after localCSE:

n2n       BBStart <block_2> (freq 10000)                                                      [0x00000247BB27A220] bci=[-1,0,-] rc=0 vc=13 vn=- li=- udi=- nc=0 
n9n       istore  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                              [0x00000247BB27A3E0] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n5n         iconst 1                                                                          [0x00000247BB27A2E0] bci=[-1,0,-] rc=3 vc=16 vn=- li=- udi=- nc=0 
n14n      astore  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                              [0x00000247BB27A520] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n6n         loadaddr  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                          [0x00000247BB27A320] bci=[-1,0,-] rc=3 vc=16 vn=- li=- udi=- nc=0 
n19n      astore  <temp slot 4>[#340  Auto] [flags 0x10007 0x0 ]                              [0x00000247BB27A660] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n11n        loadaddr  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                          [0x00000247BB27A460] bci=[-1,0,-] rc=1 vc=16 vn=- li=- udi=- nc=0 
n42n      treetop                                                                             [0x00000247BB27AC20] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n6n         ==>loadaddr
n30n      istorei  <array-shadow>[#232  Shadow] [flags 0x80000603 0x0 ]                       [0x00000247BB27A920] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=2 
n6n         ==>loadaddr
n27n        iconst 0                                                                          [0x00000247BB27A860] bci=[-1,0,-] rc=1 vc=16 vn=- li=- udi=- nc=0 
n35n      istore  <temp slot 1>[#337  Auto] [flags 0x10003 0x0 ]                              [0x00000247BB27AA60] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n5n         ==>iconst 1
n38n      ireturn                                                                             [0x00000247BB27AB20] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=1 
n5n         ==>iconst 1
n3n       BBEnd </block_2>                                                                    [0x00000247BB27A260] bci=[-1,0,-] rc=0 vc=16 vn=- li=- udi=- nc=0 
dibyendumajumdar commented 4 years ago

@Leonardo2718 thank you for looking at this. The logs above are from Windows 10. Here are logs from RHEL 7.7 - one with optimizer at 'warm' and another at 'hot'.

rhel-trtrace.hot.7812.8508-7812.log rhel-trtrace.warm.7792.8490-7792.log

Leonardo2718 commented 4 years ago

I think the rhel-trtrace.warm log is showing roughly the same thing as the log for windows. The rhel-trtrace.hot log also shows the problem happening during localCSE.

Before:

n1n       BBStart <block_2>                                                                   [    0x77ec70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0 
n9n       istore  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                              [    0x77ee70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1 
n5n         iconst 1                                                                          [    0x77ed70] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n14n      astore  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                              [    0x77efb0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1 
n10n        loadaddr  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                          [    0x77eeb0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n19n      astore  <temp slot 4>[#340  Auto] [flags 0x10007 0x0 ]                              [    0x77f0f0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1 
n15n        loadaddr  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                          [    0x77eff0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n42n      treetop                                                                             [    0x77f6b0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1 
n23n        aload  <temp slot 4>[#340  Auto] [flags 0x10007 0x0 ]                             [    0x77f1f0] bci=[-1,0,-] rc=2 vc=4 vn=- li=- udi=- nc=0 
n43n      treetop                                                                             [    0x77f6f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1 
n26n        aloadi  <array-shadow>[#236  Shadow] [flags 0x80000607 0x0 ]                      [    0x77f2b0] bci=[-1,0,-] rc=2 vc=4 vn=- li=- udi=- nc=1 
n23n          ==>aload
n30n      istorei  <array-shadow>[#232  Shadow] [flags 0x80000603 0x0 ]                       [    0x77f3b0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=2 
n26n        ==>aloadi
n27n        iconst 0                                                                          [    0x77f2f0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n35n      istore  <temp slot 1>[#337  Auto] [flags 0x10003 0x0 ]                              [    0x77f4f0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1 
n34n        iload  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                             [    0x77f4b0] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n38n      ireturn                                                                             [    0x77f5b0] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=1 
n37n        iload  <temp slot 1>[#337  Auto] [flags 0x10003 0x0 ]                             [    0x77f570] bci=[-1,0,-] rc=1 vc=4 vn=- li=- udi=- nc=0 
n4n       BBEnd </block_2>                                                                    [    0x77ed30] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0 

After localCSE:

n1n       BBStart <block_2>                                                                   [    0x77ec70] bci=[-1,0,-] rc=0 vc=4 vn=- li=- udi=- nc=0
n9n       istore  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                              [    0x77ee70] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n         iconst 1                                                                          [    0x77ed70] bci=[-1,0,-] rc=3 vc=7 vn=- li=- udi=- nc=0
n14n      astore  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                              [    0x77efb0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n10n        loadaddr  <temp slot 2>[#338  Auto] [flags 0x10003 0x0 ]                          [    0x77eeb0] bci=[-1,0,-] rc=1 vc=7 vn=- li=- udi=- nc=0
n19n      astore  <temp slot 4>[#340  Auto] [flags 0x10007 0x0 ]                              [    0x77f0f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n15n        loadaddr  <temp slot 3>[#339  Auto] [flags 0x10007 0x0 ]                          [    0x77eff0] bci=[-1,0,-] rc=3 vc=7 vn=- li=- udi=- nc=0
n42n      treetop                                                                             [    0x77f6b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n15n        ==>loadaddr
n43n      treetop                                                                             [    0x77f6f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n26n        aloadi  <array-shadow>[#236  Shadow] [flags 0x80000607 0x0 ]                      [    0x77f2b0] bci=[-1,0,-] rc=2 vc=7 vn=- li=- udi=- nc=1
n15n          ==>loadaddr
n30n      istorei  <array-shadow>[#232  Shadow] [flags 0x80000603 0x0 ]                       [    0x77f3b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=2
n26n        ==>aloadi
n27n        iconst 0                                                                          [    0x77f2f0] bci=[-1,0,-] rc=1 vc=7 vn=- li=- udi=- nc=0
n35n      istore  <temp slot 1>[#337  Auto] [flags 0x10003 0x0 ]                              [    0x77f4f0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n         ==>iconst 1
n38n      ireturn                                                                             [    0x77f5b0] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=1
n5n         ==>iconst 1
n4n       BBEnd </block_2>                                                                    [    0x77ed30] bci=[-1,0,-] rc=0 vc=7 vn=- li=- udi=- nc=0
vijaysun-omr commented 4 years ago

The way this ought to have worked is via aliasing on the sym ref 232. The store to that sym ref would be analyzed by local CSE as part of it's normal processing and if the sym ref 338 is aliased with sym ref 232, then the local copy propagation done by local CSE would not happen.

dibyendumajumdar commented 4 years ago

hi @vijaysun-omr, It would be good to know if this is a bug.

vijaysun-omr commented 4 years ago

Hi @dibyendumajumdar, I cannot see the aliasing information in the log file attached. Maybe generating the same log with traceAliases added to your command line options would help confirm this is the problem. Not having the right aliasing will trip up optimizations and the bug should be fixed in the aliasing infrastructure rather than in the optimizer. The OMR aliasing infrastructure may not be taking into account "address taken" autos etc. (as you probably know, it is not possible to take the address of an auto in Java which is where the OMR compiler gets it's heritage). So, the aliasing infrastructure would need to be enhanced to alias "address taken" autos in some way, e.g. a very simple technique could be track all the sym refs that are accessed using a loadaddr opcode (gets the address of the memory location represented by the sym ref) and conservatively, alias them with all shadows.

dibyendumajumdar commented 4 years ago

Okay thank you, I will regenerate the trace as suggested.

dibyendumajumdar commented 4 years ago

Here is trace generated with following flags: TR_Options=traceIlGen,traceFull,traceAliases,log=trtrace.log

I think appropriate aliasing info is not being generated. I wonder if the 'address taken' autos data introduced in https://github.com/eclipse/omr/pull/2823 needs to be also considered in other scenarios. trtrace.log.11485.12999-11485.log

dibyendumajumdar commented 4 years ago

I did an experiment. I changed the following in TR_BitVector * OMR::SymbolReference::getUseDefAliasesBV(bool isDirectCall, bool includeGCSafePoint):

      default:
         //TR_ASSERT(0, "getUseDefAliasing called for non method");
         if (comp->generateArraylets() && comp->getSymRefTab()->aliasBuilder.gcSafePointSymRefNumbers().get(self()->getReferenceNumber()) && includeGCSafePoint)
            return &comp->getSymRefTab()->aliasBuilder.gcSafePointSymRefNumbers();
         else
            {
            if (kind == TR::Symbol::IsAutomatic && !symRefTab->aliasBuilder.addressTakenAutos().isEmpty())
               {
               TR_BitVector * aliases = new (aliasRegion) TR_BitVector(bvInitialSize, aliasRegion, growability);
               *aliases |= symRefTab->aliasBuilder.addressTakenAutos();
               return aliases;
               }
            }
            return 0;

      }

This is probably not correct as it will return the address taken autos bitmap for any 'automatic' variable, but it seems to do the trick. Attached is the log.

trtrace.log.14161.33823-14161.log

Hence it is probably necessary to make a change of this kind.

dibyendumajumdar commented 4 years ago

However above experimental change does not help in a few other tests where I am getting bad code when more optimizations are enabled.

In particular, say I have a local var, and this is accessed via double indirection, then can the aliasing infrastructure handle the 'transitive' aliasing? That is, pointer 1 aliases local var, and pointer 2 takes address of pointer1 hence it also aliases local var.

vijaysun-omr commented 4 years ago

I think you are on the right track in terms of the part of JIT code to change.

I feel it will be more correct if you ORed in the symRefTab->aliasBuilder.addressTakenAutos() before we return the alias set for any shadow sym ref. The converse of that is that when asked for the alias information for any "address taken" auto, you ORed in all the shadow sym refs. This can maybe be improved upon in the future to be less conservative but to get your tests passing from a functional standpoint, you could do these two changes.

For context, any indirect access would be done via a shadow sym ref and if we set up the aliasing relationship between it and address taken autos as I described, I believe it would handle the transitive aliasing case you mentioned as well.

dibyendumajumdar commented 4 years ago

@vijaysun-omr Okay thank you. Presumably that is under section?:

case TR::Symbol::IsShadow:
vijaysun-omr commented 4 years ago

@dibyendumajumdar yes for the shadow sym refs, it is under that section that you showed. The second change I mentioned would go under the auto sym refs section that you were changing before.

To elaborate on the rationale for the earlier suggestion, the aliasing api that you are changing for "use def aliases" can be intuitively looked at as a way of asking the question : "If there was a store to the sym ref that we are requesting use def aliases for, loads of which sym refs MAY be affected by the store, i.e. which loads of which sym refs may have their value changed by the store (meaning we cannot move those loads past the store in any way)." This hopefully clarifies the need for the "symmetric aliasing" I was suggesting earlier. If you have an auto a and it's address has been taken like p = &a; then a store to auto a affects the value of a load p (will use a shadow sym ref in the OMR compiler) and equally a store to p (again, will use a shadow sym ref in the OMR compiler) affects the value of a load of auto a. So we need the 2 sym refs involved to both be returning the other sym ref in their "use def aliases" set.

dibyendumajumdar commented 4 years ago

@vijaysun-omr thanks for the explanation. I will revert back later after I have tried this.

dibyendumajumdar commented 4 years ago

Hi @vijaysun-omr,

I had a look at this but didn't have any success. I do not know if I am setting up the alias data correctly in the first place. Secondly when the method I referred to is called, all calls appear to be for kind == 0 (i.e. automatic), so the shadow case is never even executed. So not sure if the problem is even before this method gets called (i.e. in my initial alias data setup).

That is to say, I am not sure what calls to make when I am creating an array load/store instruction. I have tried various things but it is like stabbing in the dark, without knowing what I should really be doing. Intuitively a stack variable is at a certain location, and I guess that we need to be able to say that the array load/store overlaps that location. However the way to so this is not clear.

Just to give you some idea here is the bit of code where I generate the IL. As you will see I have tried various options at one time or another.

    TR::SymbolReference* findOrCreateShadowSymRef(
        /*TR::SymbolReference *typeSymbol, */ uint64_t typeSymbol, TR::DataType type, int32_t offset)
    {
        // fprintf(stderr, "Searching for type %lld offset %d\n", typeSymbol, offset);
        for (ShadowSymInfo* syminfo = shadow_symbols_; syminfo != nullptr; syminfo = syminfo->next) {
            if (syminfo->typeSymbol == typeSymbol && syminfo->type == type && syminfo->offset == offset) {
                // fprintf(stderr, "Found shadow %p\n", syminfo->shadowSymbol);
                return syminfo->shadowSymbol;
            }
        }
        TR::Symbol* sym = TR::Symbol::createShadow(_comp->trHeapMemory(), type, TR::DataType::getSize(type));
        TR::SymbolReference* symRef = new (_comp->trHeapMemory())
            TR::SymbolReference(_comp->getSymRefTab(), sym, _comp->getMethodSymbol()->getResolvedMethodIndex(), -1);
        symRef->setOffset(offset);

        // conservative aliasing
        int32_t refNum = symRef->getReferenceNumber();
        if (type == TR::Address)
            _comp->getSymRefTab()->aliasBuilder.addressShadowSymRefs().set(refNum);
        else if (type == TR::Int32)
            _comp->getSymRefTab()->aliasBuilder.intShadowSymRefs().set(refNum);
        else
            _comp->getSymRefTab()->aliasBuilder.nonIntPrimitiveShadowSymRefs().set(refNum);
        ShadowSymInfo* info = new (_comp->trHeapMemory()) ShadowSymInfo();
        info->type = type;
        info->offset = offset;
        info->shadowSymbol = symRef;
        info->typeSymbol = typeSymbol;
        info->next = shadow_symbols_;
        shadow_symbols_ = info;
        // fprintf(stderr, "Created shadow %p\n", info->shadowSymbol);
        return info->shadowSymbol;
    }

void JIT_ArrayStore(JIT_ILInjectorRef ilinjector, JIT_NodeRef basenode, JIT_NodeRef indexnode, JIT_NodeRef valuenode)
{
    auto injector = unwrap_ilinjector(ilinjector);
    auto base = unwrap_node(basenode);
    auto index = unwrap_node(indexnode);
    auto value = unwrap_node(valuenode);
    auto type = value->getDataType();

    TR::ILOpCodes storeOp = injector->comp()->il.opCodeForIndirectArrayStore(type);
    auto array_offset = get_array_element_address(injector, type, base, index);
#if 1
    TR::SymbolReference* symRef = injector->symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);
    TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, array_offset, value, 0, symRef);
#else
    TR::Symbol* sym = TR::Symbol::createShadow(injector->comp()->trHeapMemory(), type, TR::DataType::getSize(type));
    TR::SymbolReference* symRef = new (injector->comp()->trHeapMemory()) TR::SymbolReference(
        injector->comp()->getSymRefTab(), sym, injector->comp()->getMethodSymbol()->getResolvedMethodIndex(), -1);
    symRef->setReallySharesSymbol();

    // conservative aliasing
    int32_t refNum = symRef->getReferenceNumber();
    if (type == TR::Address)
        injector->comp()->getSymRefTab()->aliasBuilder.addressShadowSymRefs().set(refNum);
    else if (type == TR::Int32)
        injector->comp()->getSymRefTab()->aliasBuilder.intShadowSymRefs().set(refNum);
    else
        injector->comp()->getSymRefTab()->aliasBuilder.nonIntPrimitiveShadowSymRefs().set(refNum);
    TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, array_offset, value, 0, symRef);
#endif
    injector->genTreeTop(store);
}

void JIT_ArrayStoreAt(
    JIT_ILInjectorRef ilinjector, uint64_t symbolId, JIT_NodeRef basenode, int64_t idx, JIT_NodeRef valuenode)
{
    auto injector = unwrap_ilinjector(ilinjector);
    auto base = unwrap_node(basenode);
    auto index = TR::Node::lconst(idx);
    auto value = unwrap_node(valuenode);
    auto type = value->getDataType();
    TR::SymbolReference* symRef = nullptr;
    TR::ILOpCodes storeOp = injector->comp()->il.opCodeForIndirectArrayStore(type);
    auto aoffset = get_array_element_address(injector, type, base, index);
#if 0
  if (symbolId) {
    symRef = injector->findOrCreateShadowSymRef(symbolId, type, (int32_t)idx);
  }
#endif
    if (!symRef)
        symRef = injector->symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);
    TR::Node* store = TR::Node::createWithSymRef(storeOp, 2, aoffset, value, 0, symRef);
    injector->genTreeTop(store);
}

One problem I have is that my front-end IR generates array based load/store instructions even for primitive types, treating the stack as an array. I do not therefore want to cause poor optimization with unnecessary aliasing information

vijaysun-omr commented 4 years ago

@dibyendumajumdar the optimization that seems to cause the wrong transformation is local common subexpression elimination (from looking at one of the first logs attached to this issue):

[    37] O^O LOCAL COMMON SUBEXPRESSION ELIMINATION:    Local Common Subexpression Elimination propagating local #338 in node : 00000247BB27AA20 PARENT : 00000247BB27AA60 from node 00000247BB27A3E0
         O^O LOCAL COMMON SUBEXPRESSION ELIMINATION:    Rhs of store def node : 00000247BB27A2E0

The local CSE optimization incorrectly believes that the store of auto sym ref 338 can be propagated to the load of that auto sym ref later in the same block. The reason this is incorrect is that there is a store to an array shadow sym ref in between that store and load of 338 that essentially is storing to the same location (given that the sym ref 338 was one that had it's "address taken").

The way this propagation is meant to be stopped is via aliasing as we've been discussing and in particular by the code guarded by the boolean set on the following line

https://github.com/eclipse/omr/blob/c1ba8e20485aed4c8be638ddb5ab55b20e18a435/compiler/optimizer/OMRLocalCSE.cpp#L504

when this line is executed with node being the store into the array shadow sym ref (i.e. in the log I am looking at):

n30n istorei <array-shadow>[#232 Shadow] [flags 0x80000603 0x0 ]

then hasAliases should be true.

I thought I'd suggest checking if we are setting hasAliases to true for the store node above at the line number I pasted a link to since you mentioned not reaching the code you were modifying for shadow sym refs at all in your last comment. hasAliases has to be true I expect for the store node above to engage the code in local CSE to try to do the right thing.

dibyendumajumdar commented 4 years ago

Hi @vijaysun-omr Thank you for the analysis. I am certainly not setting hasAliases in my code. I will have a look at that (because I work off an IR generated by the C front-end it is sometimes hard to have full control).

vijaysun-omr commented 4 years ago

To be 100% clear, I meant investigating what value you see for the hasAliases boolean in that local CSE line I mentioned for all the nodes as they are being processed by local CSE. Assuming it's false, you would then need to investigate how hasAliases was computed incorrectly for your front end by debugging into the node API that it seems to depend on (e.g. node->mayKill).

dibyendumajumdar commented 4 years ago

Okay thank you. I started investigating why the array shadow load/store do not generate any aliasing. The problem might be this:

symRefTab()->findOrCreateArrayShadowSymbolRef(type, base);

I am trying the other one which takes a size argument, and now at least I see:

Symref #232 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 000000AD4AF4DA38 {232, 239}
Symref #236 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 000000AD4AF4DA88 {229, 236}

I also see the shadow section being called. I have to next try ORing the address taken bitmap.

dibyendumajumdar commented 4 years ago

So after ORing addressTakenAutos with arrayElementSymRefs I get:

Symref #232 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDDB8 {232, 239}
Symref #236 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDE08 {229, 236}
Symref #337 <temp slot 1> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDE58 {232, 236, 338, 339}
Symref #338 <temp slot 2> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDEA8 {232, 236, 338, 339}
Symref #339 <temp slot 3> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDEF8 {232, 236, 338, 339}
Symref #340 <temp slot 4> 
   Use Aliases: NULL 
   Usedef Aliases: 000000ED620FDF48 {232, 236, 338, 339}

This looks more promising. However some tests still fail.

dibyendumajumdar commented 4 years ago

Next version of the original test:

Symbol References with Aliases:

Symref #232 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD7A8 {232, 239, 338, 339}
Symref #236 <array-shadow> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD7F8 {229, 236, 338, 339}
Symref #337 <temp slot 1> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD848 {232, 236, 338, 339}
Symref #338 <temp slot 2> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD898 {232, 236, 338, 339}
Symref #339 <temp slot 3> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD8E8 {232, 236, 338, 339}
Symref #340 <temp slot 4> 
   Use Aliases: NULL 
   Usedef Aliases: 0000006319AFD938 {232, 236, 338, 339}

However even now other tests fail.

dibyendumajumdar commented 4 years ago

Example of failing test case:

void typepun (float *ptr2)
{
  *ptr2=0;
}

int main(void)
{
        int val;

        int *ptr = &val;
        float *ptr2 = (float*)&val;

        *ptr=1;
        typepun (ptr2);
        if (*ptr)
        return 1;
    return 0;
}

This involves function call - so probably indicates my PR or method aliasing address taken vars is not yet fully effective.

trtrace.log.15288.2464.log

vijaysun-omr commented 4 years ago

@dibyendumajumdar I looked at this last log you uploaded and find the problematic transformation is:

[ 63] O^O VALUE PROPAGATION: Constant folding iload [0x000001B03B202B60] to iconst 1

To me this suggests that use-def info is not taking into account the fact that a call might be aliasing an auto. A log with traceUseDefs added to your options would help in confirming.

I found this comment about an argument to be passed to UseDefInfo's constructor:

 *                                   when local (stack) variable has been aliased by a function call.
 *                                   A value of false is fine for Java like languages where
 *                                   local (stack) variables cannot be passed by reference to function calls
 *                                   and hence cannot be aliased. However for C like languages this flag should be
 *                                   set to true.
 */
TR_UseDefInfo::TR_UseDefInfo(TR::Compilation *comp, TR::CFG *cfg, TR::Optimizer *optimizer,
      bool requiresGlobals, bool prefersGlobals, bool loadsShouldBeDefs, bool cannotOmitTrivialDefs, bool conversionRegsOnly, 
      bool doCompletion, bool callsShouldBeUses) 

This suggests that callsShouldBeUses should be set to true in your case but I think the real issue may well be that we need some way of expressing that calls could be defs for autos that have had their address taken. Your call aliasing does seem to be correct, so we'd have to look at UseDefInfo.cpp to check how to make it work in your case.

dibyendumajumdar commented 4 years ago

Hi @vijaysun-omr

I thought I was setting callsShouldBeUses to true in the NJ Optimizer subclass, but perhaps I didn't as I don't see it committed here https://github.com/dibyendumajumdar/nj/blob/master/frontends/nj/optimizer/NJOptimizer.hpp. So it may well explain the failure. I will revert back after checking this.

Update: actually NJ doesn't have the latest changes so it is being hardcoded to 'true' here: https://github.com/dibyendumajumdar/nj/blob/master/compiler/optimizer/UseDefInfo.cpp

dibyendumajumdar commented 4 years ago

Here is the trace with traceUseDefs.

trtrace.log.9040.83606.log

vijaysun-omr commented 4 years ago

@dibyendumajumdar the last trace you uploaded has different IL generated (more complex) than the prior trace; just wondering if this is intentional. The prior log was easier to work with since it had a call that was fairly simple and a use of the address taken auto later:

n33n        call  file:line:typepun[#342  computed-static Method] [flags 0x500 0x0 ]          [0x000001B03B202960] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=5 nc=1
n16n          ==>loadaddr
n43n      ificmpne --> block_4 BBStart at n6n ()                                              [0x000001B03B202BE0] bci=[-1,0,-] rc=0 vc=44 vn=- li=- udi=- nc=2 flg=0x20
n41n        iload  <temp slot 3>[#339  Auto] [flags 0x10003 0x0 ]                             [0x000001B03B202B60] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=- nc=0
n42n        iconst 0          

whereas the new trace has the call looking like:


n56n        icall  file:line:f[#351  computed-static Method] [flags 0x500 0x0 ] ()            [0x000002604A51C1E0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=11 nc=5 flg=0x20
n51n          loadaddr  <temp slot 7>[#343  Auto] [flags 0x10007 0x0 ]                        [0x000002604A51C0A0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=15 nc=0
n52n          loadaddr  <temp slot 3>[#339  Auto] [flags 0x10003 0x0 ]                        [0x000002604A51C0E0] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=16 nc=0
n53n          loadaddr  <temp slot 4>[#340  Auto] [flags 0x10003 0x0 ]                        [0x000002604A51C120] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=17 nc=0
n54n          loadaddr  <temp slot 8>[#344  Auto] [flags 0x10007 0x0 ]                        [0x000002604A51C160] bci=[-1,0,-] rc=1 vc=44 vn=- li=- udi=18 nc=0
n55n          loadaddr  <temp slot 9>[#345  Auto] [flags 0x10007 0x0 ]   

with a comparison tree that is quite complex following it.

dibyendumajumdar commented 4 years ago

@vijaysun-omr I will check if I ran the correct test

dibyendumajumdar commented 4 years ago

My bad - it was another test that is also failing. Here is the correct log fle. trtrace.log.5288.43750.log

vijaysun-omr commented 4 years ago
 Use #5[00000232F6752B60] is defined by:
      Def #1[00000232F6752820]

shows the relevant load's def information and it corresponds to the store before the call, i.e. the call node was not considered a def even though it aliases it. This could mean some changes to UseDefInfo.cpp are needed to make this test pass since your passing in of the boolean to consider calls as uses was also not enough.

dibyendumajumdar commented 4 years ago

@vijaysun-omr Okay thank you for the analysis.

I guess that even if we solved this particular case another one might crop up to do with this type of operation. I am wondering if it is worth pursuing - how likely is that OMR will be used for a C like language? For my C front-end, I can simply disallow taking addresses of local stack variables; I do not need this capability for the main usage which is in Ravi, where I carefully avoid this scenario.

If however, it is felt that OMR should support languages like C, then it would be worth spending the effort. I would like to work on it but it will probably take me a while to figure out. On the plus side as a by-product it may be worthwhile documenting the way aliasing data should be setup by the front-end and how that data is then used in various scenarios.

Appreciate your help so far, and interested to know what you and the wider team think about above.

vijaysun-omr commented 4 years ago

I agree with your last comment. The change to use-def info itself to resolve the issue we see in your current failing test(s) may be a small to medium (trending to latter for someone who does'nt know the code) effort. It's not so large an effort to dismiss out of hand but also not so trivial as to say we should just fix it and continue your testing. Your comment suggests you would be able to make progress by disallowing taking the address in your C-front and so a fix is not necessarily blocking you.

In the end, it probably depends on the wider question you raised around OMR use for C-like languages (that others can also express an opinion on). At the very least we should probably document this "known limitation" for C-like languages and add an issue to the backlog.

vijaysun-omr commented 4 years ago

Oh, and if you wanted to attempt fixing this in use-def info as an ongoing exercise, I'd be happy to help you with that effort via suggestions, looking at logs and answering questions (as I did in this issue) :)

dibyendumajumdar commented 4 years ago

@vijaysun-omr It is definitely not blocking me and I would not want to take up your time with fixing an issue that is not affecting the wider OMR user base. I am happy to work on it in the background. Thank you for the help, but please treat it as non-urgent.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment on the issue or it will be closed in 60 days.