Samsung / ONE

On-device Neural Engine
Other
435 stars 157 forks source link

[one-optimize] Refactor RemoveDeadNodeWithQueryPass #13863

Closed icodo98 closed 3 weeks ago

icodo98 commented 2 months ago

I think code here is bug. https://github.com/Samsung/ONE/blob/aa2ff1bbf575c2081f88586f5f25da26d50ff511/compiler/logo/src/Passes/RemoveDeadNodeWithQueryPass.cpp#L45-L55

The current code has an issue where, if both normal and dead nodes are mixed in the candidates, it deletes the normal nodes but skips over checking the other candidate nodes.

seanshpark commented 2 months ago

do you have a model for this?

icodo98 commented 2 months ago

Background

I found this issue when I'm trying to support non-const paddings for pad ops in #13790 . I made a draft code and successfully made .opt.circle without any -O options. But with onecc -O1 one fails with this error below. circle2circle: circle2circle: /home/ssafy/one/compiler/luci/export/src/CircleTensorExporter.cpp:169: void {anonymous}::MultiOutputDetector::store_outputs(luci::CircleNode*, uint32_t): Assertion 'outs.size() == count' failed.

How to reproduce

Since I found this issue with code in my own repo, I made some sample model to reproduce this issue. badDead.zip
you could download and reproduce it with onecc -C badDead.cfg.

icodo98 commented 2 months ago

When run with debugger, following is result.

After passing through SubstitutePackToReshapePass, five DeadNode candidates are generated. Among them, the nodes with indices 0, 3, and 4 are optimized dead nodes, while the nodes with indices 1 and 2 are actually not dead. However, it is observed that the node with index 2 (0x5555555bbb10) still remains in the list of candidates even after passing through the for-loop.

Hanjin-Choi commented 2 months ago

@icodo98 I agree that the problem occurs when skipping after deletion. If we change the code as follows, the skipping behavior after deleting a node will disappear.

for (auto it = candidates.begin(); it != candidates.end(); ) 
{
    if (auto service = (*it)->dialect()->service<DeadNodeQueryService>()) 
    {
        if (!service->isDeadNode(*it)) 
        {
            it = candidates.erase(it);
            continue;
        }
    }
    ++it;
}
jinevening commented 2 months ago

Nice catch! Thanks for reporting. I'll make a fix soon :)

Hanjin-Choi commented 2 months ago

Nice catch! Thanks for reporting. I'll make a fix soon :)

If it's alright with you, I was working on a draft based on the code I wrote. Would you like me to continue writing it?

jinevening commented 2 months ago

Of course I'm okay with it :) One concern is that this seems to be out of the scope of ssafy project. @shs-park Is it ok @Hanjin-Choi contribute this topic?

One thing to note: Patch itself would be simple, but I guess that writing a test code would be more laborious.

Hanjin-Choi commented 2 months ago

One thing to note: Patch itself would be simple, but I guess that writing a test code would be more laborious.

I agree. But it would be an honor to have the opportunity to take on the challenge.

jinevening commented 2 months ago

I assigned @Hanjin-Choi to this issue.

shs-park commented 2 months ago

Is it ok @Hanjin-Choi contribute this topic?

@jinevening,

The SSAFY project, in this year, also includes covering issues that can be derived from the Shape inference topic. This part seems like a good opportunity. Thank you!

Hanjin-Choi commented 1 month ago

I feel sorry for dragging this Issue. I tried to write test code, but I feel lost. So, I need some advices.

I thought I need to make graph to check whether node is dead or not. I tried to make a graph two ways.

  1. First, I tried implementing the graph using loco::Push like other pass tests. https://github.com/Samsung/ONE/blob/40ee60bc468bde92018f561cccf266973246f4c4/compiler/logo/src/Passes/EmptyTestGraph.test.cpp#L24-L48 However, with this approach, it's impossible to assign a value to Node->dialect()->service(), making it difficult to determine if a node is dead using isDeadNode()

  2. Second, I tried implementing the graph using luci::CircleNode. I confirmed that if a CircleNode is created, the service is injected. However, when I attempted to create a CircleNode using luci/IR/CircleNodes.h, it seems that a build error occurs because luci is not built before logo.

jinevening commented 1 month ago

It seems that you try to implement gtest. If you take that approach, you should implement a test under luci, not logo. Although logo is changed, the impact is shown in luci.

Rather than implementing gtest, it would be easier to implement a regression test for the problematic model.

You can add a test in circle2circle-dredd-recipe-test as follows.

  1. Generate tflite recipe from the problematic model (you can use tflchef-reverse tool)

  2. Put the recipe with a proper test.rule under res/TensorFlowLiteRecipes. -> You can just check op count of pack and reshape.

  3. Add test to circle2circle-dredd-recipe-test w/ substitute_pack_to_reshape.

  4. Check the test fails without your change in the debug mode (this problem is only visible in debug mode). ./nncc test -R circle2circle_dredd_recipe_test

  5. Check the test passes with your change in the debug mode.

It would be great if you can post the result of 4 and 5 in the draft PR,

Hanjin-Choi commented 1 month ago

It would be great if you can post the result of 4 and 5 in the draft PR,

Thanks a lot! I will follow your suggestion.

Hanjin-Choi commented 1 month ago

I wrote Draft-PR PTAL =)

https://github.com/Samsung/ONE/pull/14051

shs-park commented 3 weeks ago

@Hanjin-Choi, @jinevening,

I think all the related PRs are merged, so I will close this issue. Please feel free to reopen it if you have any other related issue.