google / CFU-Playground

Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM). . . . . . Online tutorial: https://google.github.io/CFU-Playground/ For reference docs, see the link below.
http://cfu-playground.rtfd.io/
Apache License 2.0
458 stars 117 forks source link

about proj_template demo code #98

Closed akioolin closed 3 years ago

akioolin commented 3 years ago

Hi, @tcal-x :

I found this sample will stop output in my current setting, qmtech wukong board + buad rate 3000000. so I make some change to pass the stop problem.

here is the code.

static void do_exercise_cfu_op0(void) { puts("\r\nExercise CFU Op0 aka ADD\r\n"); int count = 0; int pass_count = 0; int fail_count = 0;

for (int a = -0x71234567; a < 0x68000000; a += 0x10012345)
{
    for (int b = -0x7edcba98; b < 0x68000000; b += 0x10770077)
    {
        int cfu = cfu_op0(0, a, b);
        if (cfu != a + b) 
        {
            printf("[%4d] a: %08x b:%08x a+b=%08x cfu=%08x FAIL\r\n", count, a, b, a + b, cfu);
            fail_count++;
        } else {
            pass_count++;
        }
        count++;
    }
}
printf("\r\nPerformed %d comparisons, %d pass, %d fail\r\n", count, pass_count, fail_count);

}

BTW, I also met the undefined code behaviour error, if I reduce the if (cfu != a + b){} statement. the gcc will complain the "a + b" is not defined. so I change the optimization level form O3 to O1 to avoid this error.

BR, Akio

tcal-x commented 3 years ago

Hi @akioolin , thank you for the report. Yes, I think I've seen that too, where the USB gets overwhelmed. It looks like it hangs, but then if you hit return a couple times, it will start interacting again.

But yes, I agree with the change; we shouldn't print out every successful match. I think in one of the other tests, I set it up to print out status only every N iterations...here: https://github.com/google/CFU-Playground/blob/main/common/src/functional_cfu_tests.c#L61

Your solution is fine too though -- can you submit a PR?

akioolin commented 3 years ago

Hi, @tcal-x :

Yes. What you siad is what I met. the PR had been submitted. but It seems meet the O3 vs O1 problem.

BR, Akio

akioolin commented 3 years ago

Hi, @tcal-x :

add "-fno-aggressive-loop-optimizations" in common/Makefile's SHAREFLAGS could avoid this error.

BR, Akio

tcal-x commented 3 years ago

Hi @akioolin , I think I'm beginning to understand the 2nd issue -- I don't see the "not defined" warning, but I do see the strange behavior when the original printf is removed. I was being somewhat sloppy with signed/unsigned, since the cfu() operation expects unsigned. I think the better solution is to change variables "a", "b", and "cfu' to unsigned, and adjust the "for" loop ranges accordingly.

I would prefer to not add the flag to disable optimizations, since we do in fact want the code (including the tensorflow code) to get optimized as much as possible.

Thank you for helping to figure this out and raise these issues.

Edit: I do see the warning now, and I also found the very long stack overflow article on the issue. It does seem that using unsigned variables should resolve the issue.

akioolin commented 3 years ago

Hi, @tcal-x :

Thank you. Sorry for my poor description. I will do some test under remove add-on flags, and change the variable setting from signed to unsigned. If it pass, I could do a another PR.

BR, Akio