astonbitecode / j4rs

Java for Rust
Apache License 2.0
645 stars 36 forks source link

Unsoundness in to_rust_string may cause UB due to unsafe assumptions about the input pointer #138

Open lwz23 opened 1 week ago

lwz23 commented 1 week ago

Description: The to_rust_string function uses CStr::from_ptr to convert a raw pointer (*const c_char) into a CStr. However, it does not validate that the pointer meets the necessary safety requirements. This can lead to Undefined Behavior (UB) if the pointer is invalid or if it does not point to a valid null-terminated C string. https://github.com/astonbitecode/j4rs/blob/68cdc1e362050bf04972f710a20dbf91aea12ce6/rust/src/utils.rs#L29

pub fn to_rust_string(pointer: *const c_char) -> String {
    let slice = unsafe { CStr::from_ptr(pointer).to_bytes() };
    from_java_cesu8(slice).unwrap().to_string()
}

Problem Description:

  1. Invalid Pointer: The function assumes that the input pointer is valid and non-null. If pointer is null or points to invalid memory, CStr::from_ptr(pointer) will invoke Undefined Behavior.
  2. Missing Null-Termination: The function assumes that the memory referenced by pointer is null-terminated. If the string is not properly null-terminated, CStr::from_ptr may read beyond the allocated memory, causing Undefined Behavior.
  3. Invalid UTF-8 Input: The function uses from_java_cesu8(slice).unwrap(). If the input slice contains invalid CESU-8 data or if from_java_cesu8 panics, the program may crash.
  4. Error Handling: The function calls .unwrap() on the result of from_java_cesu8(slice). If from_java_cesu8 returns an Err, the program will panic instead of gracefully handling the error. Steps to Reproduce: Provide a null pointer as input:
    let null_pointer: *const c_char = std::ptr::null();
    let result = to_rust_string(null_pointer); // UB: Null pointer

    Expected Behavior: The function should validate the input pointer and ensure it points to a valid, null-terminated C string. The function should handle invalid CESU-8 data gracefully instead of panicking. Additional Notes: The current implementation assumes valid input, which makes the function unsafe. Adding proper validation will make it more robust and prevent potential crashes or undefined behavior. If performance is a concern, consider using assertions in debug builds but validating input in release builds.

lwz23 commented 1 week ago

ping?

astonbitecode commented 1 week ago

Thanks for the report. I will have a look.

lwz23 commented 1 week ago

Thanks, please also take a look at https://github.com/astonbitecode/j4rs/issues/140, I think they are samilar : )

astonbitecode commented 15 hours ago

Fixed the unwrap of from_java_cesu8, using Result.

Regarding the Cstr the to_rust_string may indeed cause UB when it is used widely without any validation. However, this is not the case here, since validation is done implicitly. Actually there is no way to have an invalid pointer at this point.

Eg., one of the function usages is in jni_utils::string_from_jobject. You may see there a null pointer check for the passed jobject argument and further, this function is called by to_rust_boxed where additional checks are done. The input is always a Java String and this is guaranteed to be valid.

Did you maybe notice some code flow that may result to invalid Cstr pointer input?

lwz23 commented 14 hours ago

You're right, I didn't notice utils was not a pub mod, I thought all users could call this function directly.

lwz23 commented 14 hours ago

But since this is the case, wouldn't it be more appropriate to declare to_rust_string as pub(crate)? I'm not sure because I don't know much about the project.

astonbitecode commented 14 hours ago

The utils module is not exposed publicly, so, the functions are not accessible.

lwz23 commented 14 hours ago

I am also a bit confused as to why to_rust_string is not marked as an unsafe function, considering that it seems to be called only by string_from_jobject and jstring_to_rust_string(sorry if I miss some usage), both of which are already marked as unsafe. If to_rust_string were also marked as unsafe, no changes would be required in other parts of the code, and it would align better with the overall logic. This is just a small suggestion from my side.

astonbitecode commented 13 hours ago

You are right about this. Thanks for the suggestion. Pushing a commit for it.