bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
765 stars 58 forks source link

Fix NULL characters contained in Device::get_name() #32

Closed LutzCle closed 5 years ago

LutzCle commented 5 years ago

When saving the output of Device::get_name() to a file, it becomes visible that the device's name string is padded with NULL characters until the end of the array:

GeForce 940M^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

The issue can be reproduced by saving the output of device::test::test_get_name to a file and then viewing it with a text editor.

This fix filters out the NULL characters before converting the C string to a Rust String, producing the expected output:

GeForce 940M
bheisler commented 5 years ago

Hey, thanks for the pull request! I've decided to fix this myself in a slightly different way. I hope that works for you.

LutzCle commented 5 years ago

Sure, works for me. I tested your fix and has the same output as my pull request.

Although I am curious as to what I can learn from your approach. Could you briefly explain what the key difference is? In my view, the PR improved on existing code, because it avoided an unnecessary unsafe function. To be clear, I don't mean to condescend, just learn. Thanks!

bheisler commented 5 years ago

Well, mostly your implementation made me realize that I was using that unsafe function in a way that was likely to cause undefined behavior. In retrospect it's kind of obvious, but initially I assumed that from_bytes_with_nul_unchecked would scan the byte array to find the nul terminator and create a CStr referring to just the part that contained actual text. It doesn't, of course, it just assumes that you've done that already and given it a byte slice that contains exactly the text plus one nul.

As for why I went with my solution (which is to do the scan and pass an appropriately-sized subslice) over yours, efficiency is part of it. Everything RustaCUDA does is unsafe, so avoiding an unsafe function call isn't really the priority here. Your solution iterates over every byte in the name array (all 128 of them), where mine stops early at the first nul. More importantly, using collect() means that your code could allocate more than once, where mine will allocate exactly once.

In practice it probably doesn't matter (it would be very strange to call this function in a hot loop), so the main reason is "because it seemed like a good idea at the time". I suppose that's kinda unsatisfying, but I am grateful to you for pointing out how broken my first implementation was.

LutzCle commented 5 years ago

Thank you for elaborating. I didn't know that collect() potentially allocates more than once. I agree that performance isn't relevant in this concrete case, but it's a very useful detail to know to check for. Need to read up on it!