Traneptora / jxlatte

Java JPEG XL decoder
MIT License
44 stars 5 forks source link

Slow decoding #10

Closed Ali-RS closed 1 year ago

Ali-RS commented 1 year ago
long startTime = System.nanoTime();
JXLDecoder decoder = new JXLDecoder("samples/body_alb.jxl"); // file size is 30 KB
JXLImage image = decoder.decode();
double timePass = (System.nanoTime() - startTime) / 1000000000d;
System.out.println("It took " + timePass + " seconds to decode image");
It took 26.376047794 seconds to decode image

I am trying JpegXL in a real-time application (for game dev) but the loading time seems not promising. As you can see in my above test it took 26 seconds to load a 30 KB jxl file from the samples directory.

Any chance for speed up the loading time in the future?

Thanks

Traneptora commented 1 year ago

That doesn't sound right at all, using the image from #6 which has the same filename as your test image, takes 1.5 seconds on my system. Are you using git master?

Ali-RS commented 1 year ago

Are you using git master?

It is built from this commit https://github.com/thebombzen/jxlatte/commit/bd6ff4772bda31e21c22971bbb45b8247c304e49

I will retest using the most recent changes. Thanks

Just in case, I am using OpenJDK 19 temurin.

Traneptora commented 1 year ago

That still sounds wrong, if you execute the jarfile, what happens?

Ali-RS commented 1 year ago

I just build from master and I still have the issue.

if you execute the jarfile, what happens?

java -jar jxlatte-1.0-SNAPSHOT.jar bench.jxl out.png
It took 29.831777605 seconds to decode image.
Decoded to pixels, writing PNG output.

Here is the jar file if you want to try it jxlatte.zip

It is built with Gradle and java 11.

Edit: I am on Linux Mint if that matters.

Edit2: I also tested on a Windows 10 device with Java 19 and have the same problem. (it takes around 30 seconds)

Edit3: And by the way, the converted png file for bench.jxl looks wrong out

Traneptora commented 1 year ago

I don't support crappy build systems, build it the correct way and try again.

Ali-RS commented 1 year ago

build it the correct way and try again.

@thebombzen, ok, I build it from the master branch using Meson and retried it.

java -jar jxlatte.jar bench.jxl bench.png
Decoded to pixels, writing PNG output.

It took around 24 seconds to convert bench.jxl

java -jar jxlatte.jar ants.jxl ants.png
Decoded to pixels, writing PNG output.

and took around 189 seconds to convert ants.jxl.

Traneptora commented 1 year ago

What system are you on?

Ali-RS commented 1 year ago

Linux Mint 20.1 Ulyssa Intel Core i3 CPU 8GB RAM

Tried with OpenJDK 11, 17, 19, 20

Edit: Also tested on a Windows 10 device with an Intel Core i5 CPU and had the same problem

Traneptora commented 1 year ago

I'll have to add some profiling code, cause that doesn't make sense.

Traneptora commented 1 year ago

Linux Mint 20.1 Ulyssa Intel Core i3 CPU 8GB RAM

Tried with OpenJDK 11, 17, 19, 20

Edit: Also tested on a Windows 10 device with an Intel Core i5 CPU and had the same problem

What does cat /proc/cpuinfo | grep flags return?

Ali-RS commented 1 year ago

What does cat /proc/cpuinfo | grep flags return?

flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts md_clear flush_l1d
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts md_clear flush_l1d
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts md_clear flush_l1d
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts md_clear flush_l1d
Traneptora commented 1 year ago

Figured it out, you don't have fma instructions, and on systems without fma instructions,java.lang.Math.fma is extremely slow. I removed any reference to it as some more research found out that autovectorization makes it so it's not really necessary.

Traneptora commented 1 year ago

Fixed by 02d2ebc