Closed Yehsam23 closed 1 year ago
Have you used a debugger to track down the source of the crash? It would have led you to the fact that there are bugs in your own code. Also, when you do debug, don't stop at your own code, but trace into the engine code to figure out what is going on in there.
Just a quick look at your code shows a problem right away; you are calling retain()
on a local variable newRT
, one which you're passing a callback to, and you're also not calling release()
on it anywhere.
The reason Director::getInstance()->getRenderer()->render();
prevents the crash is because you're forcing it to process the render queue within the current block, so newRT
is still valid. The reason it crashes without that line is probably because the callback is no longer pointing to valid memory. The callback command is added to the queue, as you can see in RenderTexture::saveToFile
, and since the callback was passed using a local variable, by the time the callback is called in the next update cycle, newRT
is no longer valid.
void RenderTextureTest1::saveFile(Ref* pSender)
{
auto director = Director::getInstance();
auto s = director->getVisibleSize();
auto newRT = RenderTexture::create(s.width, s.height, backend::PixelFormat::RGBA8, backend::PixelFormat::D24S8, false);
newRT->beginWithClear(0, 0, 0, 0);
// It's will crash on axmol, switch to _clippingNode will not show
this->visit();
// _clippingNode->Node::visit();
newRT->end();
auto callback = [&] (RenderTexture* rt, std::string_view path) {
AXLOG("Save To Path=%s", path.data());
};
newRT->retain();
newRT->saveToFile("TestSave.png", Image::Format::PNG, true, callback);
// Add this function to avoid crash if we switch to a new scene.
Director::getInstance()->getRenderer()->render();
}
The rendering command code used in Axmol is different to Cocos2d-x, so just because it works in Cocos2d-x doesn't mean that there is something wrong with Axmol (edit: in this case, there may actually be a bug in the engine).
@halx99 I just ran the standard test "43:Node: RenderTexture" (Windows, x86 build), and it's crashing on the first test if you touch the view area and move around. It should draw items on the screen, but what seems to happen is that a few appear, then it starts flickering, like @Yehsam23 mentioned. If you try to close the app, it crashes in Renderer::~Renderer()
in this block:
for (auto&& clearCommand : _callbackCommandsPool)
delete clearCommand; <= here it crashes, and there are a lot of commands in the _callbackCommandsPool
It seems like once the number of commands in the _callbackCommandsPool
go over a certain amount (or added too quickly), it causes problems, and if that isn't it, then I have absolutely no idea why it is crashing.
If the _callbackCommandsPool
is not used, then the crash no longer occurs, and it draws on screen correctly in RenderTexture test. It just means it'll keep eating up memory with every cmd = new CallbackCommand();
in Renderer::nextCallbackCommand()
, but at least that narrows down the problem somewhat.
@halx99 Looking into this some more, I noticed that the _callbackCommandsPool
contains copies of the same command pointers. The crash on exit happens because once one of those pointers is deleted in the loop, the rest become invalid, and since there are multiple entries of the same pointer, the delete fails:
Aren't the command pointers unique, and should only exist once in that vector?
So, a quick test using a lookup to check if the command already exists in the vector actually fixed the problem, but I still can't quite pinpoint why there is a problem to begin with. It fixed both the drawing and the crash, and there were way less entries in the vector (like 9 commands vs 100's).
void Renderer::processRenderCommand(RenderCommand* command)
{
...
case RenderCommand::Type::CUSTOM_COMMAND:
flush();
drawCustomCommand(command);
break;
case RenderCommand::Type::CALLBACK_COMMAND:
{
flush();
static_cast<CallbackCommand*>(command)->execute();
// Check if the pointer already exists before adding it to the vector
auto&& itr = std::find(_callbackCommandsPool.begin(), _callbackCommandsPool.end(), command);
if (itr == _callbackCommandsPool.end())
{
_callbackCommandsPool.emplace_back(static_cast<CallbackCommand*>(command));
}
break;
}
default:
assert(false);
break;
}
}
I found that it seems that this crash occurs when using two different RenderTextures
My above Second issue crash is here:
@Yehsam23 It's hard to tell why yours is crashing with the save without the exact code you're using. I mentioned in a previous post that the initial code you posted is incorrect, specifically in the RenderTextureTest1::saveFile
method. Did you fix the issue in that function? Is the crash you're seeing now based on the original or fixed code?
@rh101
I know i need to release, It just quick sample test for crash, and I don't need newRT
pointer to do anything
I just want to say saveToFile will crash if use that code.
case RenderCommand::Type::CALLBACK_COMMAND:
{
flush();
static_cast<CallbackCommand*>(command)->execute();
// Check if the pointer already exists before adding it to the vector
auto&& itr = std::find(_callbackCommandsPool.begin(), _callbackCommandsPool.end(), command);
if (itr == _callbackCommandsPool.end())
{
_callbackCommandsPool.emplace_back(static_cast<CallbackCommand*>(command));
}
break;
}
It will stop crash
But if you press SaveToFile for the second time, the screenshot file will incorrect (Different from the screen)
If remove Director::getInstance()->getRenderer()->render();
, it will be correct
So what I want to determine is whether it is not suitable to use Director::getInstance()->getRenderer()->render() under axmol? Just like the third question I mentioned above, the two screens will be different Currently only happens on iOS
In case this helps:
_renderGroups
seems to have multiple entries for renderQueueID = 1, which is the one that contains the callback commands:
Now, when it loops through, it is processing the same group multiple times, and since the group contains the exact same callback command pointers, it attempts to add them multiple times to the _callbackCommandsPool
once they're processed:
Some debugging AXLOG commands added on CallbackCommand usage:
EMPLACE: 0BADDA78 POP: 0BADDA78 EMPLACE: 0BADDA78 POP: 0BADDA78 EMPLACE: 0BADDA78 POP: 0BADDA78 POP: 0BADF2F0 POP: 0BADE720 EMPLACE: 0BADE720 EMPLACE: 0BADDA78 EMPLACE: 0BADF2F0 POP: 0BADF2F0 POP: 0BADDA78 POP: 0BADE720 POP: 0BADE138 NEW: 0BADE210 NEW: 0BADD8C8 NEW: 0BADDF88 EMPLACE: 0BADDF88 <= group 1 processed EMPLACE: 0BADF2F0 EMPLACE: 0BADDA78 EMPLACE: 0BADE720 EMPLACE: 0BADE138 EMPLACE: 0BADE210 EMPLACE: 0BADD8C8 DUPLICATE EMPLACE: 0BADDF88 <= group 1 processed again DUPLICATE EMPLACE: 0BADF2F0 DUPLICATE EMPLACE: 0BADDA78 DUPLICATE EMPLACE: 0BADE720 DUPLICATE EMPLACE: 0BADE138 DUPLICATE EMPLACE: 0BADE210 DUPLICATE EMPLACE: 0BADD8C8 DUPLICATE EMPLACE: 0BADDF88 <= group 1 processed again DUPLICATE EMPLACE: 0BADF2F0 DUPLICATE EMPLACE: 0BADDA78 DUPLICATE EMPLACE: 0BADE720 DUPLICATE EMPLACE: 0BADE138 DUPLICATE EMPLACE: 0BADE210 DUPLICATE EMPLACE: 0BADD8C8
Here are the AXLOG entries:
CallbackCommand* Renderer::nextCallbackCommand()
{
CallbackCommand* cmd = nullptr;
if (!_callbackCommandsPool.empty())
{
cmd = _callbackCommandsPool.back();
cmd->reset();
_callbackCommandsPool.pop_back();
AXLOG("POP: %p", cmd);
}
else
{
cmd = new CallbackCommand();
AXLOG("NEW: %p", cmd);
}
return cmd;
}
and
void Renderer::processRenderCommand(RenderCommand* command)
{
...
case RenderCommand::Type::CUSTOM_COMMAND:
flush();
drawCustomCommand(command);
break;
case RenderCommand::Type::CALLBACK_COMMAND:
{
flush();
static_cast<CallbackCommand*>(command)->execute();
auto&& itr = std::find(_callbackCommandsPool.begin(), _callbackCommandsPool.end(), command);
if (itr == _callbackCommandsPool.end())
{
_callbackCommandsPool.emplace_back(static_cast<CallbackCommand*>(command));
AXLOG("EMPLACE: %p", command);
}
else
{
AXLOG("DUPLICATE EMPLACE: %p", command);
}
break;
}
default:
assert(false);
break;
}
}
All of the problems seem to start when RenderTexture::begin()
is called, which is when it uses the exact same group, resulting in multiple entries of the same group ID in the rendering queue:
void RenderTexture::begin()
{
...
Renderer* renderer = _director->getRenderer();
renderer->addCommand(&_groupCommand);
renderer->pushGroup(_groupCommand.getRenderQueueID());
...
}
As far as I can tell, if RenderTexture::begin()
is called more than once per render cycle for the same instance of RenderTexture
, then it ends up adding multiple entries in the render queue for the same renderQueueID
. That causes the group to be processed more than once, and results in the Renderer::processRenderCommand
processing the same commands again and also adds the commands back into the _callbackCommandsPool
, even though they already exist in there.
That is why the problem occurs only when using something like the touch listener onTouchesMoved
method for the RenderTexture begin()/end(), possibly because it is called more than once per render cycle.
This is an example of a fix for the issue, but I don't know if it's the right one to use:
void RenderQueue::emplace_back(RenderCommand* command)
{
float z = command->getGlobalOrder();
if (z < 0)
{
auto& queue = _commands[QUEUE_GROUP::GLOBALZ_NEG];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
else if (z > 0)
{
auto& queue = _commands[QUEUE_GROUP::GLOBALZ_POS];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
else
{
if (command->is3D())
{
if (command->isTransparent())
{
auto& queue = _commands[QUEUE_GROUP::TRANSPARENT_3D];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
else
{
auto& queue = _commands[QUEUE_GROUP::OPAQUE_3D];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
}
else
{
auto& queue = _commands[QUEUE_GROUP::GLOBALZ_ZERO];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
}
}
or alternatively:
void RenderQueue::emplace_back(RenderCommand* command)
{
const auto z = command->getGlobalOrder();
QUEUE_GROUP queueGroupId;
if (z < 0)
{
queueGroupId = QUEUE_GROUP::GLOBALZ_NEG;
}
else if (z > 0)
{
queueGroupId = QUEUE_GROUP::GLOBALZ_POS;
}
else
{
if (command->is3D())
{
if (command->isTransparent())
{
queueGroupId = QUEUE_GROUP::TRANSPARENT_3D;
}
else
{
queueGroupId = QUEUE_GROUP::OPAQUE_3D;
}
}
else
{
queueGroupId = QUEUE_GROUP::GLOBALZ_ZERO;
}
}
auto& queue = _commands[queueGroupId];
auto&& itr = std::find(queue.begin(), queue.end(), command);
if (itr == queue.end())
queue.emplace_back(command);
}
It just checks if the group already exists in the queue, and if so, doesn't add it again. This means that RenderTexture will work, and there will no longer be any duplicate CallbackCommand entries or crashes as result of that.
Maybe a look into the history of CCRender.cpp (and some more files) is useful too? In an earlier version (axis) it was working. Find the last "working" version and check what is changed on the version after.
Find the last "working" version and check what is changed on the version after.
Yeah the render code was changed significantly, with additions such as the _callbackCommandsPool
etc. My previous post with the changes to RenderQueue::emplace_back(RenderCommand* command)
does fix all the issues, but again, I'm not certain that it is the correct way to fix it.
@rh101 Maybe your fix can used as workaround?
@halx99 This issue should be pinned. Workaround should be merged. And a deepher look until and after release of axmol 1.0.
Another possible solution is to use a pool for the GroupCommand objects, the same way the callback commands are pooled. This would mean we don't need to check if a group already exists in the queue within RenderQueue::emplace_back()
.
So, instead of storing a local GroupCommand where it is needed, you just call something like render->getNextGroupCommand()
, and it either creates a new one or gives you one from the pool.
GroupCommand* Renderer::getNextGroupCommand()
{
if (_groupCommandPool.empty())
{
return new GroupCommand();
}
auto* command = _groupCommandPool.back();
_groupCommandPool.pop_back();
return command;
}
The GroupCommand can then be added back to the pool in the Renderer::processRenderCommand()
method:
case RenderCommand::Type::GROUP_COMMAND:
processGroupCommand(static_cast<GroupCommand*>(command));
_groupCommandPool.emplace_back(static_cast<GroupCommand*>(command));
break;
So, as an example in RenderTexture, this would happen:
RenderTexture::begin()
{
...
Renderer* renderer = _director->getRenderer();
auto groupCommand = renderer->getNextGroupCommand();
renderer->addCommand(groupCommand);
renderer->pushGroup(groupCommand->getRenderQueueID());
...
}
This fixes the issues without needing the change to RenderQueue::emplace_back()
.
Note that this version of the fix also seems to have resolved any issues that @Yehsam23 mentioned when calling the RenderTexture::saveToFile()
multiple times in succession. There is no more flickering, weird artifacts on the screen, or crashes after clicking the "Save" button multiple times (using code from the first post of this issue). The output images from the saveToFile
are also exactly the same for each call, as one of the issues was that every successive call to saveToFile
was causing different output or corrupt files if Director::getInstance()->getRenderer()->render()
was not called after the RenderTexture::end()
call.
@halx99 I'll submit a PR for this with the GroupCommand pooling similar to how the CallbackCommand pooling is handled, but it's completely up to you to decide if it is a reasonable way to fix these issues. This change means there is no need to do iterate through the pooled data to check if a reference to the command already exists in there, so there should be no impact whatsoever on performance.
This change is based on my understanding of how things seem to work, but it's not an area that I am 100% familiar with, so it may not be the best way to fix the underlying issues.
@rh101 Small question: Make this PR the render process also faster (at least for class RenderTexture)? If yes it makes sense to add this PR not only for the fix. My point of view.
I'm not sure if the performance of RenderTexture has increased as a result of this change, but it definitely did not decrease. So, at the very least, it's just as fast as before.
I merge pr #969 , it's has fixed most of the issues
But iOS have same issue in Director::getInstance()->getRenderer()->render();
It can be reproduce by "40:Node: RenderTexture"
at RenderTextureSave::addImage
add Director::getInstance()->getRenderer()->render();
to the last line
iOS can't show any image if clicked "Add Image"
But iOS have same issue in
Director::getInstance()->getRenderer()->render();
It can be reproduce by "40:Node: RenderTexture" atRenderTextureSave::addImage
addDirector::getInstance()->getRenderer()->render();
to the last line iOS can't show any image if clicked "Add Image"
I don't understand what you mean.
Is the unmodified test failing to show an image when you click "Add Image"?
What I want to express is that using Director::getInstance()->getRenderer()->render();
under iOS will make RenderTexture useless
That's why I said that the way to reproduce is as above
What I want to express is that using
Director::getInstance()->getRenderer()->render();
under iOS will make RenderTexture uselessThat's why I said that the way to reproduce is as above
That does not answer the question that I asked. So, once again, does the unmodified "40:Node: RenderTexture" test work correctly or not?
Please don't make changes to code and report issues in this manner. If you want to show an issue, then create a new project with the minimal amount of code required to reproduce the problem, then post that code either here or in a github repo for others to download and test.
Random changes to existing code that already works can have strange behavior, especially when something else entirely could be causing the problem.
Also, just stating it doesn't work is not very helpful at all. Did you investigate the issue yourself? Nothing is preventing you from doing so.
OK, this is a sample project TestRenderTexture.zip
It is work on Android, but there will be problems on iOS
I have tried to find the problem, that part seems to be related to Metal, I can't figure out how to start
About groupCommand and RenderTerxture. If I properly understand? The issue due to the fact that _groupCommand can be added to the render queue multiple times in a frame? It is more like on bad implementation of RenderTexture class, do not you think so?
I did test and this fix should help with cpp_test (bode 43 ex.)
If I understand @rh101 propose the same thing, to use separate instance of _groupCommand for each RenderTexture::begin() calling, or I wrong?
@solan-solan I'll test it with your change to see if it fixes the issues I was seeing, and if it does work, then yours will be a simpler fix.
@rh101 I checked your decision one more time, and looks like it is good idea anyway to have separate pool for group commands. It is more general aproach and it does not touch render cycle. Since that I would say it is more good solution.
@solan-solan With the changes you suggested to RenderTexture, how do you reset the _is_set_group
before the next render cycle, because it does need to be reset once that render cycle is complete?
@rh101 To my understand RenderTexture::begin() is called from the event handler and it could be called multiple times in a frame by the reason. But RenderTexture::onBegin() is called from the render cycle when events are not processed already. And it is good place to reset _is_set_group before next frame in my changes
But RenderTexture::onBegin() is called from the render cycle when events are not processed already.
I see what you mean, and there is also RenderTexture::onEnd()
, which may be more suitable too, since that will definitely the end of processing the render group for that render texture. Still haven't tested your changes yet, but I will report back here when I do.
@solan-solan
I tested your fix. (_is_set_group = false;
....)
Works well on windows 2022 64bit
@solan-solan I tested your changes, and while it works with the cpp_tests, it still crashes when calling RenderTexture::saveToFile(...)
more than once in the code that Yehsam23 posted (first post).
If you run that code, click the "Save" button, wait for it to save the image, then click it again, and the graphics become distorted, then click it again two more times, and it starts flickering, and then if you close the application, the application crashes on exit with the same issue as before, duplicate CallbackCommand entries in the list render list. The only way to avoid the graphic glitches and longer crash is to add a "Director::getInstance()->getRenderer()->render();" to the end of the call.
void HelloWorld::saveFile(Ref* pSender)
{
auto director = Director::getInstance();
auto s = director->getVisibleSize();
if (!newRT)
{
newRT = RenderTexture::create(s.width, s.height, backend::PixelFormat::RGBA8, backend::PixelFormat::D24S8, false);
newRT->retain();
}
newRT->beginWithClear(0, 0, 0, 0);
this->visit();
newRT->end();
auto callback = [&](RenderTexture* rt, std::string_view path) {
AXLOG("Save To Path=%s", path.data());
};
newRT->saveToFile("TestSave.png", Image::Format::PNG, true, callback);
// Add this function to avoid crash on exit or scene switch
Director::getInstance()->getRenderer()->render();
}
So, if that last line is removed, the graphic glitches and a crash on exit occurs.
Added this test to the original RenderTest, you can replace it with original files (it is first now). There is no error if Of couse there is no some data in the saved file. Now it is clear that if uncommented visit some render commads are broken when 'save' button is pressend second time.
Now it is clear that if uncommented visit some render commads are broken when 'save' button is pressend second time.
Yes, it seems so. The only way I've managed to get it all completely working is with the group command pool implementation, which avoids having to do any checks in RenderTexture and anywhere else that may require more than one render command in the queue per render cycle.
@rh101 I have nothing against your decision, but - does your soltion require this line for properly usage?
// Add this function to avoid crash on exit or scene switch
Director::getInstance()->getRenderer()->render();
I believe it is needed more investigations if so.
The issue occured if the lines to save to file are commented in the code I attached. It is not related with saving to file
There is no issue in this code
_rt->begin();
_iconSprite->setVisible(true);
_iconSprite->setPosition(touchPos);
_iconSprite->visit();
_iconSprite->setVisible(false);
_rt->end();
But error happens there
newRT->beginWithClear(0, 0, 0, 0);
// It's will crash on axmol, switch to _clippingNode will not show
this->visit();
// _clippingNode->Node::visit();
newRT->end();
There should be some difference in the rendered nodes.
@rh101 I have nothing against your decision, but - does your soltion require this line for properly usage?
// Add this function to avoid crash on exit or scene switch Director::getInstance()->getRenderer()->render();
The GroupCommand pool implementation works both with or without the line Director::getInstance()->getRenderer()->render();
in that code.
The issue occured if the lines to save to file are commented in the code I attached. It is not related with saving to file
That's very interesting, and it's something I didn't think of checking. Good catch!
But error happens there
newRT->beginWithClear(0, 0, 0, 0); // It's will crash on axmol, switch to _clippingNode will not show this->visit(); // _clippingNode->Node::visit(); newRT->end();
There should be some difference in the rendered nodes.
Yes, that would explain the duplicate Stencil-related callback commands in the render queue that were causing the crashes. Perhaps it's something in the stencil clipping code that is causing the issue. It makes more sense now why the screen started to flicker/graphics started disappearing too.
@rh101
The GroupCommand pool implementation works both with or without the line Director::getInstance()->getRenderer()->render(); in that code.
This line is overhead, and it is good that it is not necessary.
I noticed that you fixed ClippingNode also, about group commands? Looks like these group commands are added to the render queue twice per render cycle. First one, by ther reason of the _clippingNode is the child of scene, and the second one by the reason of this->visit() inside RenderTextureTest1::saveFile(Ref* pSender). If that, your solution should be integrated since it is needed to have two different instances of those group commands anyway.
I noticed that you fixed ClippingNode also, about group commands?
Yes, the group command pooling fixed all of those issues. The only one remaining that @Yehsam23 mentioned has to do with iOS/Metal renderer, so not sure what is going on with that platform.
Looks like these group commands are added to the render queue twice per render cycle. First one, by ther reason of the _clippingNode is the child of scene, and the second one by the reason of this->visit() inside RenderTextureTest1::saveFile(Ref* pSender).
Oh, well that makes sense then as to why there were multiple instances of the commands in the CallbackCommand pool, and it was crashing because it would try to call "delete" on the same CallbackCommand pointers more than once when iterating through them.
If that, your solution should be integrated since it is needed to have two different instances of those group commands anyway.
That is how it works, so regardless of how many group commands are required from any code within the same render cycle, it will always be a unique one in the queue, which is how the issues were fixed.
The only one remaining that @Yehsam23 mentioned has to do with iOS/Metal renderer, so not sure what is going on with that platform.
@rh101 I do not use this platform now, but it would be better to check if render to texture works at all somehow there, without ClippingNode. Only try to render to texture some simple sprite like _iconSprite, without saving to file, then if it does not cause to crash, try to save it. May be it is issue of metal created frame buffer or render texture for this buffer.
@Yehsam23 It is not clear if "group command pool fix" does not effect ios at all? Or may be something become different?
"Group command pool fix" fixed the issue that iOS and Android would flick or disappear.
Another iOS/Metal issue I guess need to call _renderer->beginFrame();
before Director::getInstance()->getRenderer()->render()
and call _renderer->endFrame();
at end
Maybe i close this issue first, and handle the iOS issue separately?
Steps to Reproduce: Sample Code: RenderTextureTest.zip
Director::getInstance()->runWithScene(RenderTextureTest1::create());
Second issue:
Third issue:
If i use cocos2dx is correct.
Sorry for my poor english.