axmolengine / axmol

Axmol Engine – A Multi-platform Engine for Desktop, XBOX (UWP) and Mobile games. (A fork of Cocos2d-x-4.0)
https://axmol.dev
MIT License
922 stars 205 forks source link

AddressSanitizer find bugs in axmol engine #1257

Closed crazyhappygame closed 1 year ago

crazyhappygame commented 1 year ago

image

Sample errors:

  1. ActionsProgressTests Address Sanitizer Error: Use of out-of-scope stack memory

    void ProgressTimer::updateColor()
    {
    if (!_sprite)
        return;
    
    if (!_vertexData.empty())
    {
        const Color4B& sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }
    }
    }
  2. TextureCacheUnbindTest Address Sanitizer Error: Use of deallocated memory
        // release the asyncStruct
        delete asyncStruct;
        --_asyncRefCount;

Comments: This kind of problems means that we are in undefined behavior zone and can not reason about program correctness. This kind of issue could result in the problems seen in https://github.com/axmolengine/axmol/issues/1211

crazyhappygame commented 1 year ago

FYI. @rh101 @halx99 @DelinWorks @kiranb47

Visual studio support only AddressSanitizer https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=msvc-170

Does any one of can build and check axmol on linux against all available sanitizers [address, memory, thread, undefined]

rh101 commented 1 year ago

@crazyhappygame The ActionsProgressTests issue seems like a false positive from ASAN. It can be boiled down to this code which throws this error:

const Color4B& sc = _sprite->getQuad().tl.colors;
_vertexData[i].colors = sc;

If we change it to this, an error is still flagged by ASAN:

const auto& tl = _sprite->getQuad().tl;
_vertexData[i].colors = tl.colors;

For some reason the const reference is flagged as being out of scope, which shouldn't be the case. Unless there is something I'm not seeing, none of the code above should result in no errors.

This version of it does not throw an error:

const auto& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;
rh101 commented 1 year ago

2. TextureCacheUnbindTest Address Sanitizer Error: Use of deallocated memory

@crazyhappygame I couldn't reproduce this issue. Did you test it with the 32bit of 64bit build?

Once I did the change to fix issue 1 in ProgressTimer::updateColor(), the entire cpp_test auto test passed without any further ASAN exceptions (besides a few crashes that were fixed in #1259 that are unrelated to ASAN).

crazyhappygame commented 1 year ago

@rh101 ASAN does not produce false positive. Below line is a real bug (not a false positive)

const Color4B& sc = _sprite->getQuad().tl.colors;

Clarification: sprite returns by value V3F_C4B_T2F_Quad

V3F_C4B_T2F_Quad getQuad() const { return _quad; }

V3F_C4B_T2F_Quad temporary value is destroyed immediately and we keep reference to const Color4B& sc that points to non existing part of V3F_C4B_T2F_Quad

Possible fixes:

  1. Store copy of Color4B
    const Color4B sc = _sprite->getQuad().tl.colors;
  2. Extend lifetime of rvalue
    const auto& quad = _sprite->getQuad();

    and later use quad

    _vertexData[i].colors = quad.tl.colors;

Please check "Reference Lifetime Extension" e.g.: https://abseil.io/tips/107

Generally: This is OK. Reference Lifetime Extension.

const auto& temp = _sprite->getQuad();

This is WRONG. Reference Lifetime Extension does not work for any sub expression.

const auto& temp1 = _sprite->getQuad().tl;
const auto& temp2 = _sprite->getQuad().tl.color;
halx99 commented 1 year ago

Universal Reference is better fix it: auto&& xxx =

获取Outlook for Androidhttps://aka.ms/AAb9ysg


From: CHP @.> Sent: Sunday, July 9, 2023 3:31:36 PM To: axmolengine/axmol @.> Cc: Deal(一线灵) @.>; Mention @.> Subject: Re: [axmolengine/axmol] AddressSanitizer find bugs in axmol engine (Issue #1257)

@rh101https://github.com/rh101 ASAN does not produce false positive. Below line is a real bug (not a false positive)

const Color4B& sc = _sprite->getQuad().tl.colors;

Clarification: sprite returns by value V3F_C4B_T2F_Quad

V3F_C4B_T2F_Quad getQuad() const { return _quad; }

V3F_C4B_T2F_Quad temporary value is destroyed immediately and we keep reference to const Color4B& sc that points to non existing part of V3F_C4B_T2F_Quad

Possible fixes:

  1. Store copy of Color4B

const Color4B sc = _sprite->getQuad().tl.colors;

  1. Extend lifetime of rvalue

const auto& quad = _sprite->getQuad();

and later use quad

_vertexData[i].colors = quad.tl.colors;

Please check "Reference Lifetime Extension" e.g.: https://abseil.io/tips/107

Generally: This is OK. Reference Lifetime Extension.

const auto& temp = _sprite->getQuad();

This is WRONG. Reference Lifetime Extension does not work for any sub expression.

const auto& temp1 = _sprite->getQuad().tl; const auto& temp2 = _sprite->getQuad().tl.color;

― Reply to this email directly, view it on GitHubhttps://github.com/axmolengine/axmol/issues/1257#issuecomment-1627633575, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVHOJ7WYWJEYFUNPUYXKCDXPJM5RANCNFSM6AAAAAA2DBMBOQ. You are receiving this because you were mentioned.Message ID: @.***>

crazyhappygame commented 1 year ago
  1. TextureCacheUnbindTest Address Sanitizer Error: Use of deallocated memory

This looks like some threading problem. Most probably there is some race. We should run tread sanitizer to increase chance to reproduce this problem. I reproduce this problem on win 64 bit build

rh101 commented 1 year ago

Clarification: sprite returns by value V3F_C4B_T2F_Quad

V3F_C4B_T2F_Quad getQuad() const { return _quad; }

V3F_C4B_T2F_Quad temporary value is destroyed immediately and we keep reference to const Color4B& sc that points to non existing part of V3F_C4B_T2F_Quad

You're absolutely right; I completely missed that the return value of getQuad isn't a reference.

crazyhappygame commented 1 year ago

Universal Reference is better fix it: auto&& xxx =

  1. @halx99 Universal reference will not work. (update: double checked Universal reference && ASAN report error)
  2. @rh101 getQuad returning by reference will work
halx99 commented 1 year ago
auto&& quad = _sprite->getQuad();
temp._vertexData[i].colors = quad.tl.colors;

not work?

crazyhappygame commented 1 year ago

I checked below:

auto&& t = helpArrow->getQuad().tl.colors;

That should work

auto&& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

There should be no difference how Reference Lifetime Extension works for below cases

auto&& t1= _sprite->getQuad();
const auto& t2 = _sprite->getQuad();
const V3F_C4B_T2F_Quad& t2 = _sprite->getQuad();
rh101 commented 1 year ago
auto&& quad = _sprite->getQuad();
temp._vertexData[i].colors = quad.tl.colors;

Sorry, yes that does work, which is equivalent to:

const auto& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

Which would be preferred?

1 - Change getQuad to return a const reference? So it would become: const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

or

2 - Update it's usage in void ProgressTimer::updateColor() to:

auto&& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;
crazyhappygame commented 1 year ago

Both works for me.

Not sure how big can _vertexData vector and what compiler code is generated here but there is a chance that below could be the faster. There is no indirection in body loop quad.tl.colors

        const Color4B sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }

And most probably changing to below will be some small performance and readability improvement

const Color4B sc = _sprite->getQuad().tl.colors;
for(auto& d: _vertexData){
   d.color = sc;
}
rh101 commented 1 year ago

Not sure how big can _vertexData vector and what compiler code is generated here but there is a chance that below could be the faster. There is no indirection in body loop quad.tl.colors

        const Color4B sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }

That would be an extra copy for the const Color4B sc = _sprite->getQuad().tl.colors;

If getQuads returned a const &: const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

then the old code in ProgressTimer::updateColor() can remain as it is, and it will work, plus it won't be doing an extra copy since it's just getting the reference to colors.

const auto& sc = _sprite->getQuad().tl.colors;
for (int i = 0; i < _vertexData.size(); ++i)
{
    _vertexData[i].colors = sc;
}

Also, the other places that use getQuads get a copy of it, like this:

V3F_C4B_T2F_Quad quad = _sprite->getQuad();

All they do is either calculations on the values or copy the contents somewhere else, so if we change getQuads to return a const & we can remove a single copy operation, and just have all usages do:

const V3F_C4B_T2F_Quad& quad = _sprite->getQuad(); or const auto& quad = _sprite->getQuad(); etc. etc.

crazyhappygame commented 1 year ago

I think this single copy does not matter here:

  1. we create V3F_C4B_T2F_Quad (that is much bigger then Color4B)
    V3F_C4B_T2F_Quad getQuad() const { return _quad; }
  2. We copy Color4B every single loop iteration
    _vertexData[i].colors = sc;

Most probably below would be the most performant and readable code

const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

and then

const Color4B& sc = _sprite->getQuad().tl.colors;
for(auto& d: _vertexData){
   d.color = sc;
}
halx99 commented 1 year ago

const V3F_C4B_T2F_Quad& getQuad() const { is best

rh101 commented 1 year ago

const V3F_C4B_T2F_Quad& getQuad() const { is best

I can take care of this in a few minutes. It'll affect other functions too, such as

void updateQuad(V3F_C4B_T2F_Quad* quad, ssize_t index);
void insertQuad(V3F_C4B_T2F_Quad* quad, ssize_t index);

They will become:

void updateQuad(const V3F_C4B_T2F_Quad& quad, ssize_t index);
void insertQuad(const V3F_C4B_T2F_Quad& quad, ssize_t index);

Which is probably a good thing too, since inside those methods the quad pointer is being de-referenced, and there is never a null check being done prior to that.

I'll get a PR done soon unless someone else wants to handle this.

crazyhappygame commented 1 year ago

@rh101 thank you for merging fix. Anyone could check other sanitizer on Linux/MacOS platforms [address, memory, thread, undefined]. There is definitely problem in async Texture loading....

rh101 commented 1 year ago

Anyone could check other sanitizer on Linux/MacOS platforms [address, memory, thread, undefined]. There is definitely problem in async Texture loading....

All of the texture loading tests seem to run fine with both thread and undefined behavior sanitizers enabled on Mac OSX (M1 arm64). The problem I did run into though is that I can't do the automated test, since it throws a thread error that seems to be related to the automated test threading code: image

This error occurs as soon as I click "Start AutoTest".

rh101 commented 1 year ago

If you go into the TextureCache (TextureCacheUnbindTest) tests, and you click the right arrow (next test) button very quickly, then you will definitely get crashes. This may have something to do with the scheduler, which still runs a scheduled call to TextureCache::addImageAsyncCallBack even though that test has been removed, so data may be invalid.

I could reproduce the issue constantly on MacOS, but on Windows 10 I couldn't get it to happen at all, possibly because the images load very quickly, so the callback occurs before the next test is loaded when the right arrow is clicked.

EDIT: These errors are related to the test implementation, and not TextureCache. When a test scene is deleted, it doesn't unbind the callbacks registered with the texture cache for the async image loading. The way I managed to get it working is to simply use this:

auto* cache = Director::getInstance()->getTextureCache();
cache->unbindAllImageAsync(); 

in the destructor for each test.

The only issue with the above is that the test code also needs to be refactored to move the test setup from the constructor to the onEnter() methods. The reason is that a new test scene is constructed before the previous test scene is released, so here is what would happen if the code is left in the constructors:

1 - NewTestScene constructor called, TextureCache::addImageAsync is registered for all images. 2 - CurrentTestScene destructor is called, which calls TextureCache::unbindAllImageAsync, and this unbinds all image async, including the ones created in NewTestScene. This is not the behavior we want. 3 - CurrentTestScene replaced by NewTestScene. Nothing displays since all async calls were unbound.

If we move the code to onEnter, then this is what happens:

1 - NewTestScene constructor called (no code in here) 2 - CurrentTestScene destructor called, which calls TextureCache::unbindAllImageAsync, and this unbinds all image async, belonging only to the CurrentTestScene 3 - CurrentTestScene becomes NewTestScene 4 - CurrentTestScene::onEnter() is called, which sets up the new async image loading 5 - CurrentTestScene correctly displays the test

Come to think of it, the entire test implementation is a little strange, since everything is done in the constructors, which is not ideal. The constructors shouldn't be used to set up the tests, along with the fact that virtual methods are being called from constructors, which end up being resolved statically, causing issues with overriden methods.

Anyhow, with the changes above, the TextureCache test no longer crashes.

rh101 commented 1 year ago

@crazyhappygame If you get a chance to, please try out the changes in #1267 to verify that it fixes the crash you were getting in TextureCacheUnbindTest.

rh101 commented 1 year ago

@crazyhappygame If you have verified that the issues raised have been addressed, then please close this issue.