KhronosGroup / KTX-Software

KTX (Khronos Texture) Library and Tools
Other
834 stars 222 forks source link

JNI improvements #886

Open javagl opened 3 months ago

javagl commented 3 months ago

Addresses https://github.com/KhronosGroup/KTX-Software/issues/880

I know, the title is too generic. It would be better to clearly lay out a work plan and create a clean sequence of self-contained, specific PRs. But there are just too many (unrelated) things to do. "You can't cross a chasm in two small jumps". The first pass has to be tackled somewhat holistically. The result may not be as 'good' as I'd like it to be, but I hope that it will be 'better' along many dimensions. If this is not considered to be a viable approach, then that's OK. Then I'll just create my own repo with JNI bindings for KTX, with blackjack and error handling.


The preliminary task list that I tried to create in the linked issue is now replicated here, and I'll try to update it and extend it as we move on:

This PR currently builds on the state from https://github.com/KhronosGroup/KTX-Software/pull/876 and https://github.com/KhronosGroup/KTX-Software/pull/879 (so when these PRs are merged, then the first checkboxes above can be marked as 'done')


Not (yet) supposed to be addressed here:

javagl commented 3 months ago

I locally started a few updates, including some basic error handling, so that things like KtxTexture2.create(null, ...) will no longer crash the JVM, but throw a NullPointerException instead. (Many more error checks to add...).

One thing that could/should be done next: The KtxTexture class has a destroy() method, documented like this:

    /**
     * Destroy the KTX texture and free memory image resources
     *
     * NOTE: If you try to use a {@link KTXTexture} after it's destroyed, that will cause a segmentation
     * fault (the texture pointer is set to NULL).
     */
    public native void destroy();

Now, the goal is: No segmentation faults. And we'll probably not be able to completely avoid this destroy() method. So the question is how to handle that.

I think that trying to use a KtxTexture after destroy() was called should also throw an exception - maybe an IllegalStateException. I'll probably draft this (with the type of the exception being easy to change). But it will affect basically each and every native function, which has to check whether the thiz parameter has been destroyed...

javagl commented 3 months ago

There currently are two functions in the KtxTexture... class that receive "data" in the form of byte arrays:

(The latter has been added as part of this PR)

Many libraries that are using JNI expect direct ByteBuffer instances as the input for all functions, for a variety of reasons. But I'm strongly in favor of additionally offering the conveniece of a byte[] array. This does come at a cost for the implementor. The different cases of arrays and direct (or non-direct...) buffer have to be handled, but ... this is doable (some utility functions to fetch and release the data are shown in the draft for createFromMemory).

On the Java side, the ByteBuffer-based signature is more versatile. Even if someone has a byte[] array, this can trivially be wrapped into a ByteBuffer at the call site:

byte array[] = ...;
t = KtxTexture2.createFromMemory(ByteBuffer.wrap(array), ...);

But as a middle-ground, I'd suggest to...


An aside: The documentation of the setImageFromMemory currently says

* Set the image data - the passed data should be not modified after passing it! The bytes
* will be kept in memory until {@link KTXTexture#destroy} is called.

This statement is not true, and will be updated accordingly.


Remotely related: There currently is a function public native byte[] writeToMemory(); which returns a byte[] array. This probably makes sense, but the function itself has to be reviewed and tested.

MarkCallow commented 3 months ago
* Set the image data - the passed data should be not modified after passing it! The bytes
* will be kept in memory until {@link KTXTexture#destroy} is called.

What does "passed data" refer to here? Is it the data at the pointer location that was passed to setImageFromMemory? Not that it matters since you say it isn't true. Indeed the data is copied into the ktxTexture object.

javagl commented 3 months ago

What does "passed data" refer to here? Is it the data at the pointer location that was passed to setImageFromMemory? Not that it matters since you say it isn't true. Indeed the data is copied into the ktxTexture object.

There are multiple "layers" here, and this revolves exactly about the question where data is copied. Imagine the following Java pseudo code, similar to what could appear in the KTX JNI bindings:

byte input[] = new byte[10];

// Create a texture from the input
KtxTexture2 texture = KtxTexture2.createFrom(input);

// XXX Will this affect the result?
input[5] = 123;

// Encode the input data that was given earlier
texture.encode();

The comment suggested that this input[5] = 123 would be "visible" for the 'encode' function that is called later. This would mean that the native library would store "a pointer" to the Java byte[] array. Something like this is actually possible, but with many, many, caveats - and here, it certainly is not the case.

javagl commented 3 months ago

A detail @ShukantPal : Currently, the bindings are set up for Java 11. I don't see a strong reason for that. Unless there is a reason, I'd suggest lowering the version requirement to "the lowest version that works" (and I think that 8 should do it).

javagl commented 3 months ago

EDIT: I decided to rename the constants - see https://github.com/KhronosGroup/KTX-Software/pull/886#issuecomment-2027307405 .


So the following question is obsolete:


For the sake of the goal of having a 1:1 mapping between C and Java, I have to ask:

Should they have the same prefixes?

I would slightly lean towards having the prefixes, which may make some aspects of https://github.com/KhronosGroup/KTX-Software/issues/887 easier. Changing this now would be a "breaking change", but ... there are probably not sooo many users of the JNI bindings, and the break is trivial ("just add the prefixes in your code"). But that's not a strong preference.

javagl commented 3 months ago

I have added a bit of error handling (updating the list in the first post accordingly).

One point about error handling is that there are still some cases where the JNI layer reports errors to std::cout, and then returns NULL or so. I'm not yet sure what to do in these cases. (But I'm sure that the errors should not (only) be reported via std::cout...). I think that these cases will usually be converted into a thrown exception, and I'll decide on a case-by-case basis what kind of exception that could be. If someone has strong preferences: Chime in here.

javagl commented 3 months ago

Two updates:

Caching field- and method IDs

All the jfieldID and jmethodID values are now obtained once, when the library is loaded, instead of looking them up again and again in each and every method call.

Error handling

Additional checks for out-of-memory errors and such have been added.

The most significant change refers to the create... Methods - or more generally, to the methods where the Java version did not return an error code as an int: These methods generally had the pattern of (pseudocode):

KtxTexture2 create(...) {
    ktxTexture2 *instance;
    ktx_error_code_e = create(&instance);
    if (result != KTX_SUCCESS) {
        std::cout << "Something wrong " << result << std::endl;
        return NULL;
    }
    return createJavaObjectFrom(instance);
}

Now, there is no sensible method for returning the information about what went wrong here - with the exception of an exception. So these methods will now usually throw the newly added KtxException when the internal error code was not KTX_SUCCESS.

One could think about other options here. For example, one could pass an uninitialized KtxTexture2 object to the create methods, as in

KtxTexture2 texture = new KtxTexture2(); // NOT INITIALIZED!:
int result = KtxTexture2.create(..., texture); // Try to initialize it
if (result == KtxErrorCode.SUCCESS) { 
    // That worked
} else {
    // Handle this...
}

Yeah, that would be a "breaking change" - but I guess nobody really cares about that right now. I don't have a really strong preference here.

javagl commented 3 months ago

Responding to my own comment from above: I decided to align the constant names with the names that they have in the native layer. The prefixes already are present, for example, in the (auto-generated) classes like VkFormat. Comparing and/or maintaining the other constants when the prefixes are removed is difficult.

I know, something like KtxTextureCreateStorage.KTX_TEXTURE_CREATE_NO_STORAGE is a mouthful, and looks redundant. But in doubt, the client/user could use static imports, turning this into KTX_TEXTURE_CREATE_NO_STORAGE (i.e. the exact same thing as in native code). Compared to the previous one, KtxTextureCreateStorage.NO, this seems reasonable. (This one would turn into NO with a static import - good luck doing a full-text search for that...).

I also started adding the documentation from the native side to these constant definitions and some of the related methods (i.e. the ones for using and setting them). These might be useful on a certain level - before and after:

Khronos KTX Docs

javagl commented 3 months ago

The last passes contained a few steps of that dull, repetitive, boring, repetitive, tedius, and repetitive handling of JavaDoc, based on the documentation from the native side.

And apparently, the JavaDocs have never been used or created before. Most of the (few) existing ones had been invalid. Now they are fixed.

I have added the creation of the JavaDoc to the POM. The current state is attached here:

libktx-4.0.0-SNAPSHOT-javadoc.zip

There are a few degrees of freedom for reducing (or handling) the redundancy of JavaDoc comments for cases like the ...Param classes, which correspond to the native ...Param structs, but introduce setters/getters for each field. The current approach is what I consider to be a reasonable middle-ground:

The rationale here is that

Here's an example for the uastcRDOMaxSmoothBlockErrorScale:

Khronos KTX JavaDoc

(Yeah, .. who's ever gonna change that anyhow? But I see complete JavaDoc as some sort of minimal baseline, to be at least on par with the native docs, even if the docs might have to be extended to be really useful for end-users...)

javagl commented 3 months ago

About the current build failures:

What's surprising is that I also don't see these errors locally, and they don't happen in the Windows build. Iff I understood that correctly, very roughly: Windows builds are done by GitHub actions. Linux/MacOs are done by Travis - and the error only appears in https://app.travis-ci.com/github/KhronosGroup/KTX-Software/jobs/620089455#L866 but not in https://github.com/KhronosGroup/KTX-Software/actions/runs/8543283946/job/23406762987#step:14:48

I'll probably have to add some quirky printf-logging in the JNI layer to even be able to guess why CreateFromMemory might fail on Linux/MacOS...

MarkCallow commented 3 months ago

About the current build failures:

What's surprising is that I also don't see these errors locally, and they don't happen in the Windows build. Iff I understood that correctly, very roughly: Windows builds are done by GitHub actions. Linux/MacOs are done by Travis - and the error only appears in https://app.travis-ci.com/github/KhronosGroup/KTX-Software/jobs/620089455#L866 but not in https://github.com/KhronosGroup/KTX-Software/actions/runs/8543283946/job/23406762987#step:14:48

I'll probably have to add some quirky printf-logging in the JNI layer to even be able to guess why CreateFromMemory might fail on Linux/MacOS...

Don't obsess about Travis vs. Actions. Both macOS and Linux builds are failing. These builds are run on different machines, even though both under Travis. Look instead at the environments on the runners, particularly the Java/JVM versions in use.

MarkCallow commented 3 months ago

About the current build failures:

There are 2 circumstances in which KTX_INVALID_OPERATION can be returned during ktxTexture2_CreateFromMemory:

  1. If the offset of the supercompression global data or of the first mip level is greater than the byte length specified to ktxTexture2_CreateFromMemory. This comes from ktxMemStream_setpos.
  2. If you are trying to load the data and it has already been loaded. For example calling ktxTexture2_LoadImageData after calling ktxTexture2_CreateFromMemory with KTX_TEXTURE_CREATE_LOAD_IMAGE_DATA_BIT set. This comes from ktxTexture2_LoadImageData.

Not clear why either of these would be triggered on Linux and macOS but not Windows.

javagl commented 3 months ago

Thanks for the details @MarkCallow .

I added some printf-debug-log output (in the hope that it would show up in the build logs), but during that, noticed an embarrassing error in the JNI function for acquiring the input data for CreateFromMemory: A sign error for the computation of the size of the data caused a value like -1000 to be interpreted as a size_t, so the function was called with something like size=1844674407370955061 (~ uint64 max value).

This is fixed now. Let's see whether the build passes.

Two points remain open:

The latter is a more general question. Unsigned types are error-prone, particularly in interfaces - see e.g. https://www.aristeia.com/Papers/C++ReportColumns/sep95.pdf , which points out exactly what happened here. And when crossing the language barrier between Java and C++, one has to be particularly careful about that...

javagl commented 2 months ago

@MarkCallow We'll have to think about a sensible strategy for reviewing and merging this.

Looking at the changes will hardly make sense. Nearly every file of the JNI bindings was modified. I tried to update the bullet point list in the first post with the progress. Beyond that, I can probably try to give a short summary here, with an overview of the most important changes/commits:

Some of this is just "baseline stuff" to follow the recommendations from https://developer.ibm.com/articles/j-jni/ . But some points still have subjective elements to them.


I have actively been using the resulting library in a small dummy application where you can drag-and-drop a PNG file, see a preview of the PNG and its KTX version, drag around some sliders for the compression/quality (and see a live preview). So I'm confident that this "mostly works". But there are still a few things to wrap up.

But maybe this can already go through a first review pass, based on the current state.

MarkCallow commented 2 months ago

How do we make further progress on this?

I find having the full KTX prefixes rather clunky. Can't we follow the WebGL model which removed all the GL prefixes?

javagl commented 2 months ago

That's the question: What could be a sensible way to review this, given that it is ... nearly a "rewrite" of the JNI bindings?

Is the bullet point list above sufficient? Should I invite further potential reviewers? I'm pretty flexible here. If you wish, we could also arrange a short call to go through the changes, what's left to do, and what could be the next steps.

I find having the full KTX prefixes rather clunky. Can't we follow the WebGL model which removed all the GL prefixes?

From quickly skimming over some Python binding files, it looks like the Python bindings are also just omitting the KTX_, so the question is then: Should the naming be consistent with the C-layer or should it be consistent with the bindings? I think that changing KtxTextureCreateStorage.KTX_TEXTURE_CREATE_NO_STORAGE to KtxTextureCreateStorage.TEXTURE_CREATE_NO_STORAGE does not make a crucial difference here, but when you say that the KTX_ prefixes should be omitted, I'll remove them.

MarkCallow commented 1 month ago

@javagl, PR #874 extends the Javascript binding to provide full libktx functionality. As part of it I went over the naming. I made the names consistent with the libktx names and I ended up removing all the ktx and KTX_ prefixes. This is because the module that contains the binding is typically called ktx. Please take a look. Perhaps it can inform the naming here too. If we change any names that exist in the current binding we should provide the old names as aliases.

I suggest asking @ShukantPal to review this as he is the author of the original binding.

The bullet point list at the top is a good guide. Obviously there are items to be completed. I think a short call would be good but I want to first familiarize myself with the code. I'll let you know when I've done that and we can then plan the call.

javagl commented 1 month ago

Thanks @MarkCallow , I'll try to allocate some time looking over the changes from the linked PR. This PR should be a step in the direction of aligning the JNI bindings with KTX itself. We can decide later whether some missing parts (like bindings for the ktxHash... family of functions) will become part of this PR or a dedicated, new one.

Perhaps it can inform the naming here too. If we change any names that exist in the current binding we should provide the old names as aliases.

I'm in favor of aligning names insofar that they should be recognizable as their native counterpart. There are some conventions in the respective target languages, though. For example, I wouldn't use something else than a UPPER_SNAKE_CASE for constant names or something else than CamelCase for class names in Java, but the details can still be sorted out. (E.g. the removal of the KTX_ prefix mentioned above)

MarkCallow commented 1 month ago

@javagl, I've been reviewing the discussion here and some of the code including the commits you specifically called out in comment of April 18th. The code I've seen looks very good.

I added a note to the description suggesting following the JS binding handling of metadata queries.

Added stringFor-functions that take the respective enum constants, and return their string representation (e.g. https://github.com/KhronosGroup/KTX-Software/commit/332510f2b98cc031c7f40b926dc1ba6b56d8c914 ). Note that this only returns the string representation, and not the elaborate description that some of the functions in the native layer return

Can't you call the native function(s) from the stringFor function to get the elaborate description. I'd particularly like to have the full descriptions of the error codes. Are there any beyond ktx_error_code_e where the native side has a convert-to-string function?

I am happy you have replaced the crashes with exceptions.

I have actively been using the resulting library in a small dummy application where you can drag-and-drop a PNG file, see a preview of the PNG and its KTX version, drag around some sliders for the compression/quality (and see a live preview). So I'm confident that this "mostly works". But there are still a few things to wrap up.

Any plan to release this? Does the preview show you both pre- and post-compression images so you can easily compare them?

Some of the JNI implementations still have to be reviewed to ensure that they don't perform operations with pending exceptions, and maybe run with -verbose:jni to spot further possible pitfalls that already existed in the code

Which implementations?

There should be a version of setImageFromMemory that does not accept a byte[] array, but a ByteBuffer as the last parameter (mentioned in https://github.com/KhronosGroup/KTX-Software/pull/886#issuecomment-2020869640)

I agree a version that accepts ByteBuffer is needed.

The open points of the bullet point list from the first post

Yes.

But maybe this can already go through a first review pass, based on the current state.

There is certainly enough here to review. I want to spent a little more time familiarizing myself with the code before we arrange that call.

Note that all GitHub Actions Windows builds are currently broken due a bad GHA runner image. Many tests, including the Java binding tests fail with segmentation violations.

javagl commented 1 month ago

Thanks for the detailed review.


I added a note to the https://github.com/KhronosGroup/KTX-Software/pull/886#issue-2208337582 suggesting following the JS binding handling of metadata queries.

I haven't looked at the detals of the hash-related functions. If it's "only" about exposing 3 functions, then it might be added here, to be on par with other bindings, but if it's too much, it might make more sense to carve it out into a follow-up PR.


Can't you call the native function(s) from the stringFor function to get the elaborate description. I'd particularly like to have the full descriptions of the error codes. Are there any beyond ktx_error_code_e where the native side has a convert-to-string function?

There are two levels here:

  1. A function of obtaining a string version of the constant itself
  2. A function for the actual description of the meaning of the constant

Until now, I only addressed 1.: Each "enum" has a stringFor function. This is particularly useful for the KtxErrorCode, where you could have System.out.println("Error " + errorCode); // Prints '16' - what is that? vs System.out.println("Error " + KtxErrorCode.stringFor(errorCode)); // Prints 'UNSUPPORTED_TEXTURE_TYPE' But similarly applies to all other "enum" types.

Now, I saw that there is the ktxErrorString function in the native layer. (And I think that this the only function of this kind in the native layer, but will have to check again to be sure). And one could make a case for exposing that via JNI, to be able to do something like System.out.println("Error " + KtxErrorCode.errorString(errorCode)); // Prints "Texture type not supported." I think that this does not add soooo much value beyond UNSUPPORTED_TEXTURE_TYPE, but can add this, if desired.


I have actively been using the resulting library in a small dummy application

Any plan to release this? Does the preview show you both pre- and post-compression images so you can easily compare them?

Until now, this was mainly for internal experiments and testing. I think that it might eventually have some aspects that are "useful", and that it could be worth being published. We'll probably have to sort out some of the "Maven Release"-related questions before this can be done, but maybe there can be an "early sneak preview" release.

The goal was to have some sort of "interactive preview". (As 'interactive' as it can be, when encoding an image takes 5 seconds...). I did create a short screengrab of that a while ago (yeah... as a GIF... but), maybe it shows the overall idea:

Khronos KTX Sneak Preview 00002


Some of the JNI implementations still have to be reviewed to ensure that they don't perform operations with pending exceptions, and maybe run with -verbose:jni to spot further possible pitfalls that already existed in the code Which implementations?

I did try to keep an eye on that for every function that I "touched", but did not systematically go through all function from top to bottom. It boils down to the following pattern:

When any JNI function that is called with env->... causes an exception, then it is not allowed to call any other JNI function. Instead, the function must return immediately (and leave the handling of the pending exception to the Java layer).

(The only functions that may be called with a pending exception are the ones like env->ExceptionCheck() or certain env->Release... functions. Details in the specification).

To check this manually, I'd have to read every function and see whether it exposes this pattern (calling a function when an exception might be pending). It can be automated to some extent, with the ‑Xcheck:jni verbose, but this only covers the case where such an exception actually appears....


I still have to look closer at the other PR, but will try to at least pick up the work here (syncing with master, addressing some of the smaller points mentioned above...) in the next days.