godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.3k stars 21.05k forks source link

Image::get_pixel() requires to use a Write lock() #13934

Closed Zylann closed 3 years ago

Zylann commented 6 years ago

Godot 3.0

I noticed that getting pixels of an Image requires to internally use a Write lock(). The problem with this is, it may trigger copy-on-write for no reason because we only read those pixels. Would there be a smarter way of accessing pixels in GDScript that doesn't involve this? (or even in C++)

poke1024 commented 6 years ago

I don't see a way. We'd need to change the API, i.e. Image::lock to allow for locking for writing and locking for reading.

pgruenbacher commented 6 years ago

I'm experiencing this exact issue as well.

mjtorn commented 5 years ago

As this has been a known issue for so long, and apparently not in anyone's interests to fix, is there an acceptable workaround?

I dropped an idea on IRC about using shaders to load the image (eg. into mat2) and instead of rendering anything, read a pixel from there. I don't think it's possible to twist shaders into doing that, so other ideas are welcome.

Note that this is probably the reason we're seeing a radical decrease in fps, from 60fps sharp to 25 or as low as 8.

Zylann commented 5 years ago

Using shaders for that would be even more inefficient because the use case is for access on the CPU, not the GPU.

BastiaanOlij commented 5 years ago

Can you keep it locked? If i recall correctly it only grabs, and releases a lock if you don’t have one already.

It has to set the lock in case other parts of the program are using or modifying the same data, only you know whether you will

On Mon, 18 Mar 2019 at 6:39 pm, Markus Törnqvist notifications@github.com wrote:

As this has been a known issue for so long, and apparently not in anyone's interests to fix, is there an acceptable workaround?

I dropped an idea on IRC about using shaders to load the image (eg. into mat2) and instead of rendering anything, read a pixel from there. I don't think it's possible to twist shaders into doing that, so other ideas are welcome.

Note that this is probably the reason we're seeing a radical decrease in fps, from 60fps sharp to 25 or as low as 8.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/13934#issuecomment-473801069, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2vaWZ2jmvChye9Ta07bhUQs01XvZccks5vX0KYgaJpZM4Qz5sb .

-- Kindest regards,

Bastiaan Olij

https://www.facebook.com/bastiaan.olij https://twitter.com/mux213 https://www.youtube.com/BastiaanOlij https://www.youtube.com/channel/UCrbLJYzJjDf2p-vJC011lYw https://github.com/BastiaanOlij

mjtorn commented 5 years ago

@BastiaanOlij now that you point that out, I moved the lock way up the stack to where that Image is created. The FPS tends to go no lower than 30, which is still pretty bad, but not as bad as it could be.

I guess there's some unnecessary COW action in get_pixel() despite the perpetual lock. Hell, reads should not require a lock, least of all a write lock that copies stuff in memory. Someone should address this issue.

Thank you for pointing me to a less crappy place to lock at either way :)


I'm working on a custom workaround involving Area2D that I'll use instead, if it pans out. I'm not sure I want to know what happens on Android or iOS if this is CPU-bound, and a modern desktop (at best) halves the FPS.

Zylann commented 5 years ago

I think I initially made this issue because of the impossibility to lock an image from two places at a time for the sole purpose of reading it only, but I don't remember what the situation was. Perhaps I was just concerned that it could cause problems because there must be a reason why there are different Write and Read locks in Godot. Otherwise it is recommended to lock only when needed, usually before a lengthy operation that you know are the only one doing, so keeping locked longer is actually a bad idea.

@mjtorn what part of your code did you identify as halving your FPS? What is it doing?

mjtorn commented 5 years ago

@Zylann I could be proven wrong by someone, but adding a lightmap, which basically just calls get_pixel() for a given texture to see if we should modulate the player's color, drops the FPS. The description on this issue and what I was told in December by a friend of mine seems to suggest it really is the locking.

Note that the code caches the texture data and acquires only one lock. Both of these were broken before and the performance was barely tolerable on a desktop PC.

If reading a pixel from a texture is generally an intense operation, then I'm barking up the wrong tree and apologize in advance, but it doesn't feel like something that should really be.

If it is an intense operation, why? Is it more involved than a simple memory access or would it be due to the sheer amount of accesses, that it's more memory-i/o bound than cpu bound?

After having said all that, why does reading require a lock at all?

Zylann commented 5 years ago

After having said all that, why does reading require a lock at all?

A lock is needed because Image stores its data in a PoolByteArray. This kind of array is best accessed by acquiring either a Read or Write lock so you can use a direct pointer to its contents, which is good for long operations that do repeated accesses to it or copy chunks of it around. Yet Image always use a Write lock, while reading with get_pixel usually only require a Read lock, hence why I opened this issue. tbh, at this point I think it was made this way for simplicity rather than correctness.

get_pixel should be quite fast actually. And if needed, in C++ you can access the data block directly and work on the raw data if needed (but harder because get_pixel handles formats for you).

If lightmaps are slow it's another issue to investigate.

mjtorn commented 5 years ago

Thanks for your answer!

The lightmap just checks get_pixel(). To be more accurate, for every _process() tick while the player is walking. And it certainly is not fast. The original Escoria code is convoluted and cares about fractions of pixels, but even taking the floor() value of the player's position and getting that pixel is slooooow.

I can try to preload and/or bucket the pixels somehow if you're absolutely sure the internals here aren't the cause for this slowness. Or forget about gradient lightmaps and deal with flatly-shaded polygons instead. :/

Zylann commented 5 years ago

@mjtorn I would suggest you put a breakpoint/debug code in lock() to verify if some code tries to lock the Image while it's already locked, so at least you will know if it's caused by an unfortunate double-lock CoW. Otherwise, it depends how many get_pixel you perform.

mjtorn commented 5 years ago

@Zylann went with print_line("Image::lock acquired"); and I see only six such locks, probably one per most processes. This is expected and safe?

In that case, I suppose sorry, the write lock and what I heard my friend augur from the code in December threw me off, and in fact this is just slow as balls. Either how/how often Escoria does it (even at its simplest) or how the data is read from memory.

(Because of priorities, I might just document this as something to look out for and use Area2D instead unless our art director absolutely positively wants gradient lightmaps.)

Thanks!


Edit: in case someone finds this, it seems having a threshold like pos.distance_to(old_pos) > 20 still looks good enough while delaying the get_pixel() calls, and best of all, the FPS stays fairly close to 60 and so far hasn't fallen very low for a long time.

BastiaanOlij commented 5 years ago

Didn’t read the whole thread, crappy on a phone:)

Markus, be aware that especially with large images any serious reading and writing has an impact. If you look on my github page you’ll find an plugin that creates noise textures in C++, just handling that volume if data... I eventually rewrote it using a viewport and a shader:) the speed difference was crazy. That may not help your use case though.

Also assuming you are reading out image data from the gpu (else i’d just keep a local copy of the source data that doesn’t need a lock) there is an aweful amount of data going back and forth.

One trick i did when i needed a colour rendered somewhere on screen, say where my mouse pointer is, is create a viewport if a small size (i think 10x10 was the smallest it allowed) that rendered just the area around my mouse. For 2d this is easy, for 3d you need to give it an orthographic camera and align its direction with the ray cast out of your normal camera. The read the contents of the viewport, only a few hundred bytes, and voila. I think i use that in my terrain editor

On Tue, 19 Mar 2019 at 5:55 pm, Markus Törnqvist notifications@github.com wrote:

@Zylann https://github.com/Zylann went with print_line("Image::lock acquired"); and I see only six such locks, probably one per most processes. This is expected and safe?

In that case, I suppose sorry, the write lock and what I heard my friend augur from the code in December threw me off, and in fact this is just slow as balls. Either how/how often Escoria does it (even at its simplest) or how the data is read from memory.

(Because of priorities, I might just document this as something to look out for and use Area2D instead unless our art director absolutely positively wants gradient lightmaps.)

Thanks!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/13934#issuecomment-474220240, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2vadS96vQWVLqJqUuK7KuhrLiY3iiNks5vYInFgaJpZM4Qz5sb .

-- Kindest regards,

Bastiaan Olij

https://www.facebook.com/bastiaan.olij https://twitter.com/mux213 https://www.youtube.com/BastiaanOlij https://www.youtube.com/channel/UCrbLJYzJjDf2p-vJC011lYw https://github.com/BastiaanOlij

mjtorn commented 5 years ago

@BastiaanOlij thanks for your comment!

I thought this operates on the GPU but from @Zylann I understood that this operates on the CPU instead. I would have thought the GPU is blazing fast for reading pixels... Anyway, it seems "something" is slow and if Godot is perfect in its implementation and Escoria is reading pixels too often, choking the bandwidth, there's little to do about it, except change the reading.

I don't know how to accomplish a more local copy of the data than lightmap_data = lightmap.get_data() (which requires a lock but apparently doesn't go crazy with them, thanks to your tip). If you have pointers I could maybe take a look at some point :)

Now I'm trying to figure out a better distance function than > 20 to keep the FPS drops at bay.

I don't think viewports help here, because the lightmap is there to say "if this pixel is dark, modulate the player darker". It's only rendered in the editor, as a static image, if the debug mode is set in the attached script so that's not a problem.

Zylann commented 5 years ago

@mjtorn the GPU is blazing fast as long as you keep it on the GPU. Downloading pixels from the graphics card is actually very slow in comparison.

went with print_line("Image::lock acquired"); and I see only six such lock

Do they overlap? That's the point of my debugging suggestion, because CoW only happens in that specific case :) If they don't overlap, then locking is not the problem. For example, if for the same Image, your print goes like:

lock
lock <-- CoW may happen here!
unlock
unlock
...

But the following is fine:

lock
unlock
lock
unlock
...

You can check which image you print from by printing (int)this as an indicator.

mjtorn commented 5 years ago

I probably should have dug out the name of the image, to make sure some overlaps are actually for the same image when starting up a room of the game.

Once that noise passes, however, there is only one acquired print and a unlocked print when I switch rooms, so actually I think this is as good as it currently gets.

Thanks for the help, sorry for the wait, and I'll get back to this if I experience or find something more :)

Calinou commented 4 years ago

Related to #25358, as the color picker functionality uses get_pixel().

KoBeWi commented 3 years ago

Resolved in 4.0, Image no longer requires locking at all.