JnCrMx / discord-game-sdk4j

Java bindings for Discord's Game SDK
MIT License
114 stars 23 forks source link

Improved channel setup #59

Closed letorbi closed 1 year ago

letorbi commented 1 year ago

This PR improves the setup process of the communication channel between the library and the Discord client:

1.) The path of the Unix domain socket file is not hard-coded anymore, but derived from the available environment variables.

2.) Under Windows a named pipe instead of a socket is used now. This finally brings us Windows support :confetti_ball:

3.) A generic DiscordChannel interface has been introduced to handle the different ways of connecting to the Discord client. The actual implementations WindowsDiscordChannel and UnixDiscordChannel are mainly wrappers around FileChannel respectively SocketChannel.

The UDS file paths and the name of the Windows pipe have been taken from the Rust discord-sdk project. Or to be more specific from file /blob/main/sdk/src/io.rs line 128 to 212.

The code has been tested with Windows 10 and Arch Linux. No tests on MacOS yet, but chances are good, that it will work there as well. Testing on MacOS was successful, too.

JnCrMx commented 1 year ago

OMG, thank you so much for this! I will review and merge this tomorrow I hope. I am a little sick right now, so I really appreciate the help!

JnCrMx commented 1 year ago

One question about WindowsDiscordChannel tho, since configureBlocking does nothing: What is the blocking behaviour of the channel? Does it always block or never blocks?

letorbi commented 1 year ago

One question about WindowsDiscordChannel tho, since configureBlocking does nothing: What is the blocking behaviour of the channel? Does it always block or never blocks?

Well, actually I didn't check :sweat_smile: I simply assumed that it just does "the right thing", since my code worked under windows. But you've made a good point, since the blocking behavior of the Unix implementation is quite crucial. I guess it would be a good idea to run the test suite under windows before merging this PR.

letorbi commented 1 year ago

Ok, a fast search gave me this StackOverflow question: Why FileChannel in Java is not non-blocking?

It seems that FileChannel is actually implemented in a blocking way, but since file-access is fast enough (especially with RandomAccessFile), it should not be a problem. Otherwise we could use AsynchronousFileChannel, if a non-blocking behavior is really required.

JnCrMx commented 1 year ago

The underlying issue is that runCallbacks() is designed to be run during an update event (for example in the main loop of a game). For this to work, it may not wait for a message from Discord if none is present. Messages from Discord occur only for events or as responses to commands sent by the application.

On Unix this can be easily accomplished by setting the SocketChannel to non-blocking, so it just throws an exception if there is no message from Discord available to be handles.

On Windows this seems to be far trickier as far as I can tell. Maybe comparing .size() and .position() will help us, but I am not sure and my Windows VM just died, so it will take me some time to test it. AsynchronousFileChannel is sadly just asynchronous and not non-blocking, which means we will need to implement some logic to check if the Futures returned by its methods have completed or not.

letorbi commented 1 year ago

Well, I've just taken a look at the official documentation for the read(ByteBuffer dst) method of SocketChannel and FileChannel and both state:

A socket channel in non-blocking mode, for example, cannot read any more bytes than are immediately available from the socket's input buffer; similarly, a file channel cannot read any more bytes than remain in the file.

So FileChannel seems to behave like a non-blocking SocketChannel by default.

Additionally, the documentation says about blocking channels:

It is guaranteed, however, that if a channel is in blocking mode and there is at least one byte remaining in the buffer then this method will block until at least one byte is read.

So we might have to add something like while (blocking && channel.position() + readLen > channel.size()) {} to the read calls, if we need the blocking behavior at some point.

letorbi commented 1 year ago

A friend just texted me, that my code is now working on both Intel and Silicon Macs, so I guess this problem is fixed as well :partying_face: However, he didn't had the chance to run the full test-suite on a Mac. Just my code, which uses discord-game-sdk4j to set an activity...

letorbi commented 1 year ago

Well... In the end it turned out that FileChannel is blocking - whatever the doc says :roll_eyes:

I've now added a non-blocking mode to WindowsDiscordChannel based on .size() and .position() and ensured that data is immediately written into the named pipe.

When I run the test on Windows and Linux the results are almost the same (see screenshots below), so I think the WindowsDiscordChannelimplementation is ok now.

However, in both cases, Linux and Windows, sending a handshake seems to be blocking. Usually I see something like:

[VERBOSE] Sent string "{"cmd":"ACTIVITY_INVITE_USER","args":{"type":1,"user_id":"691614879399936078","content":"Join me baka!","pid":9591},"nonce":"7"}" at state CONNECTED
read1 start
read1 done 0
...
read1 start
read1 done 0
read1 start
read1 done 8
read2 start
read2 done 103

But for a handshake I just see:

[VERBOSE] Sent string "{"v":1,"client_id":"698611073133051974"}" at state HANDSHAKE
read1 start
read1 done 8
read2 start
read2 done 353

The test seems to be stuck at read1 start for a few seconds before printing read1 done 8. I have no good explanation for this.

Test results on Windows: image

Test results on Linux: Screenshot from 2022-12-19 00-52-06

JnCrMx commented 1 year ago

It is correct for the handshake to be blocking. I wanted to ensure for everything to be initialized, before control is returned to the user. I will check out the errors later. Some of them might just be because of a different test environment.

letorbi commented 1 year ago

It is correct for the handshake to be blocking. I wanted to ensure for everything to be initialized, before control is returned to the user.

Ok, that makes sense.

I will check out the errors later. Some of them might just be because of a different test environment.

Yeah, that could be. I've improved the logging a bit to make it easier for you to read the test output.

letorbi commented 1 year ago

I've added a global config class, which defines the client-id and both required user-ids for all tests. This makes it easier for others to run the tests, because the values have to be only changed at one place in the code (22ad19a1e3adcc1e070a5fe581c4ae0828b8df24). After that I was able to run the tests with my own user-ids, which already reduced the number of failing tests.

However, the socket protocol test kept failing under Windows. This happened, because the test used its own way to create a channel, which was still the old approach without Windows support. I fixed this by using the code inside Core to get the channel for the test (05053ff5328ff52f12a128a82799b57f3f6b4f5f).

There are still two test failing (same on Windows and Linux, see below), but I think they are unrelated to the topic of this PR and should thus be fixed in another one.

The only question remaining is, whether we should keep the debug-logging for the read() and write() calls (introduced in 9a232114ecb46b83df3bd1c474658da96028d991). Even though they are helpful to monitor the blocking behavior, they also need a lot of code. From my point of view they can be removed, but if you like to keep them, this is fine, too.

Once this last question has been resolved, the PR is ready to be merged IMHO.

Test-results on Linux:


[VERBOSE] read(ByteBuffer) returned 0 (0ms)
[VERBOSE] read(ByteBuffer) returned 0 (0ms)
[VERBOSE] read(ByteBuffer) returned 0 (0ms)
[ERROR] Tests run: 7, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 209.924 s <<< FAILURE! - in de.jcm.discordgamesdk.DiscordTest
[ERROR] voiceTest  Time elapsed: 13.894 s  <<< ERROR!
java.lang.RuntimeException: not implemented
    at de.jcm.discordgamesdk.DiscordTest.voiceTest(DiscordTest.java:456)

[ERROR] overlayTest  Time elapsed: 35.485 s  <<< ERROR!
java.lang.NullPointerException: Cannot invoke "java.util.function.Consumer.accept(Object)" because the return value of "java.util.Map.remove(Object)" is null
    at de.jcm.discordgamesdk.DiscordTest.overlayTest(DiscordTest.java:322)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   DiscordTest.overlayTest:322 » NullPointer Cannot invoke "java.util.function.Co...
[ERROR]   DiscordTest.voiceTest:456 » Runtime not implemented
[INFO] 
[ERROR] Tests run: 9, Failures: 0, Errors: 2, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:16 min
[INFO] Finished at: 2022-12-21T12:25:22+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on project discord-game-sdk4j: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/letorbi/Desktop/discord-game-sdk4j/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
letorbi commented 1 year ago

Oh, BTW: I forgot to mention, that I've also changed the JitPack config in commit f423d69e0974fa85730e6c20014542a8eef811c3, since the build of the java-impl branch always failed. Basically I just added -Dmaven.javadoc.skip=true to the build command to mitigate the problems mentioned in #58.

The change is a bit out of scope of this PR, but I hope it is ok anyway.

JnCrMx commented 1 year ago

Thank you for all the work!

I would definetely remove the debug logging (9a232114ecb46b83df3bd1c474658da96028d991) as it is not really desired to spam the stdout of everyone using the library. We might think about integrating it into the existing debug logging callback system.

Not sure why we have to skip the JavaDoc (it was working originally), but we can figure that out later in a separated issue/pull request.

I will try to find time today to double check and review all the changes and then (after either removing the debug logging or cleaning it up to use the callback) merge this PR.

letorbi commented 1 year ago

I think the new code already uses the logging callback since cf75fc2f52406e9b9f09797038efe004fcc84c0a. At least I've changed the logging calls from System.out.println(...) to core.log(LogLevel.VERBOSE, ...). I guess that is what you've meant, isn't it?

However, I am still thinking that the logging calls should be removed, since they still spam the output when someone tries to debug with LogLevel.VERBOSE.

JnCrMx commented 1 year ago

Yes, it is what I meant! And we could always introduce another LogLevel for those messages. But you could also remove it, both options sound acceptable to me.

letorbi commented 1 year ago

Yeah, LogLevel.SPAM would be awesome :laughing: However, I've removed the read/write logging for channels again, since it needs quite a lot of extra code and it is unlikely - now that the channels are working - that the logging will be needed again.

From my point of view the PR can be merged now.

letorbi commented 1 year ago

Hej, can you already say when you will have time to take a look at the PR? Just asking, no need to be stressed ;)

letorbi commented 1 year ago

Sorry, I've messed up my branches and closed this PR by accident. See #61 for the new PR.

JnCrMx commented 1 year ago

Alright, thank you for the JitPack PR. I will attempt to review both of them this weekend or during the next week and finally merge them.