Rust-SDL2 / rust-sdl2

SDL2 bindings for Rust
MIT License
2.74k stars 468 forks source link

Fix bug in Keycode type #1378

Closed hvox closed 3 months ago

hvox commented 6 months ago

In libsdl2 type SDL_Keycode represents unicode characters plus some special codes for keys that don't have representation in unicode. But in rust-sdl2 Keycode is defined as enum of few keys and could not contain all values that are represented by SDL_Keycode in libsdl2.

Because of this mismatch, the Keycode::from_scancode function could not work correctly. I fixed it by marking Keycode as non-exhaustive enum and changing Keycode::from_i32 to accept all non-Cc unicode codepoints, not just few keys listed in type definition.

hvox commented 6 months ago

It fixes #1377

Cobrand commented 6 months ago

Using transmute to me is a big no no, especially since the output is an enum, and an illegal value of an enum in rust is the best way to have a rust panic or unexpected behavior.

Likewise, I do'nt really know where does this char::from_u32() come from.

I haven't seen this issue in detail, but I think there is probably an alternative, probably somewhere in from_scancode?

hvox commented 6 months ago

Defining Keycode as enum was a big mistake. The type represents Unicode codepoints and a few codes for keys, that can't be represented as Unicode codepoints. I don't know what type definition would be correct for SDL_Keycode and I don't want to be the person, who will fix it, since changing it will be a breaking change for users of Rust-SDL2. But in the future, Keycode type definition needs to be changed.

But right now I just want to fix Keycode::from_scancode. It does not work because Keycode::from_u32 returns None for integers that are correct values for SDL_Keycode. And I fix it by doing two things:

  1. Specifying Keycode type as #[repr(i32)] #[non_exhaustive] enum. I assume it prevents rust from panics when Keycode takes values unspecified in enum. Please tell me if I'm wrong here.
  2. Using transmute inside Keycode::from_u32 to transmute number into Keycode when this number is a valid Unicode codepoint. I use result of char::from_u32() in order to check if the number is a valid Unicode codepoint. I also check if it is in Cc category since giant match statement in Keycode::from_u32 is already handling elements of that category and if it doesn't return any enum element of Keycode then I assume it should not, so I just return None. This helps to correctly handle SDL_KeyCode::SDLK_UNKNOWN(represented by character '\0' in C code), since it is represented by None in rust code and there isn't Keycode::None.
Cobrand commented 6 months ago

I think there are ways to fix this, yes we would break some specific code, but we have ways to make the fix easier.

The main problem that I have is that it's extremely illegal code. Take this playground link

#[repr(i32)]
enum Keycode {
    A = 1,
    B = 2,
}

fn main() {
    let k: Keycode = unsafe { std::mem::transmute(5) };

    match k {
        Keycode::A => { println!("A") }
        Keycode::B => { println!("B") }
    }
}

Notice that debug and release do not give the same results (debug is just wrong, and release crashes, which is a BIG no no).

I'm sure there are many ways we could fix this, but to me the easier would be something like this:

#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug)]
#[non_exhaustive]
pub struct Keycode(pub i32)

impl Keycode {
    pub const Backspace: Keycode = Keycode(sys::SDL_KeyCode::SDLK_BACKSPACE as i32);
    pub const Tab: Keycode = Keycode(sys::SDL_KeyCode::SDLK_TAB as i32);
    ...
} 

There are probably things that I have missed with that, but at least it would fix this specific issue. The few who use pattern matching with this struct will have a bit of refactoring to do but it would not be major.

The only big issue that I see is that the struct would not be repr(i32), so all the code that uses code like keycode as i32 would need to be refactored on the client side.

hvox commented 6 months ago

Oh, wow. You're actually totally right. I didn't want to drastically change Keycode type and I naively believed repr(i32) allows me to put into enum values that are not listed in its definition. Now I see I was very wrong. Thank you!

Update: actually repr(i32) does allow us to put into enum values that are not listed in its definition, we just need to be extremely careful with match statements...

hvox commented 6 months ago

Ok, so let's change Keycode from enum to something else. I think the semantically most correct solution would be to use enum of char and a few constants for codes that don't correspond to any unicode characters. P.S. I failed to implement it because it breaks things I hardly can fix.

Solution with struct Keycode(pub i32) does work. But it violates naming convention since now we have constants in camel case rather than upper case. We can't make them upper case, since it will break a lot of user code. Also, as you said, this solution requires changing keycode as i32 into something like keycode.0 in user code. Apart from these two problems, everything is very good: tests pass, from_scancode works. P.S. Should I implement something like into_i32 method or keycode.0 does not look ugly?

There's actually a third problem: if we make the i32 field public, the user can now put any integers into the Keycode at all, bypassing the from_i32 method. So the complex check inside from_i32 is now obsolete. I propose to make the i32 field private and just implement something like into_i32 method for it:

#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug)]
pub struct Keycode(i32);

impl Keycode {
    pub fn into_i32(&self) -> i32 {
        self.0
    }
    pub fn from_i32(n: i32) -> Option<Keycode> { ... }
    ...
} 
Cobrand commented 6 months ago

But it violates naming convention since now we have constants in camel case rather than upper case.

It's only a warning and we can suppress it. People will prefer that to breaking a good portion of their code.

So the complex check inside from_i32 is now obsolete.

Isn't that the point? Keycode can have a lot of possible values depending on the locale, and there are a lot of them that are NOT in the enum. I think this complex check should only exist for scancode, but not for keycode.

Although you are right you could put anycode integer inside, we need to do some tests to see how well does SDL2 handle unknown keycodes.

hvox commented 6 months ago

Ok, I made Keycode into pub struct Keycode(pub i32);, suppressed warnings by #[allow(non_upper_case_globals)] and checked how libsdl2 handle keycode as argument. According to wiki there are only two functions, that accept Keycode as argument: SDL_GetScancodeFromKey and SDL_GetKeyName.

  1. SDL_GetScancodeFromKey is ok: it returns SDL_SCANCODE_UNKNOWN. So Rust code will get None out of Scancode::from_keycode.
  2. SDL_GetKeyName is less ok: it either decides that argument is Unicode codepoint and encodes it as utf-8 string ignoring last 8 bits, or decides that argument is some special character and calls SDL_GetScancodeName on incorrect scancode. SDL_GetScancodeName returns empty string on wrong input and sets error message (the one accessed by SDL_GetError()), which is safe to ignore.

So now if a user puts a random number into a variable of type Keycode the worst thing that will happen is that this user will get unprintable string from the keycode.name().

P.S. The code still checks that argument is non-zero in Keycode::from_i32, since 0 = '\0' = SDLK_UNKNOWN and it is represented by None in Rust code. Other code in Rust-SDL2 relies on Keycode::from_i32 returning None on zero argument.

Cobrand commented 6 months ago

Thank you for doing the tests! In the end, I think Keycode(pub i32) is a bad idea and we should just do Keycode(i32) instead. I was worried setting it to private would break pattern matching code, but it works: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=70cc74208e4a7ea0284cfa514c7eab49

So a few changes I would do:

Otherwise it seems good!

hvox commented 6 months ago

Hey. The uppercase constants should be the same names as in C code, or just uppercase version of their old names in Rust? I mean if we had variants KpDblVerticalBar and CapsLock in old code(when Keycode was an enum), then in new code how should corresponding constants be named?

  1. KP_DBL_VERTICAL_BAR and CAPS_LOCK.
  2. KP_DBLVERTICALBAR and CAPSLOCK as in corresponding C code.

I feel like the first option might be frustrating for new users coming from C, because some constants they remember by heart have now acquired underscores in random places. So currently I implemented the second option. Did I do the right thing?

Cobrand commented 6 months ago

I think copying C here for the naming is better, but honestly it should not matter that much. It will just fail to compile if someone makes a typo, and most people have autocomplete on anyway.

It looks good to me, before merging I'll try to make a few tests on my own locally with a few projects I have had, we might see new issues arise. Thanks for the PR anyway!