ValleyBell / libvgm

A more modular rewrite of most components from VGMPlay. will include sub-libraries for audio output, sound emulation and VGM playback
irc://irc.digibase.ca/#vgmrips
136 stars 33 forks source link

Fix alignment issues and portable 24bps packing #110

Closed jprjr closed 10 months ago

jprjr commented 1 year ago

Fixes #108 , also obsoletes #109

Replacing the direct assigned-to-pointer-casts with memcpy calls avoids potential undefined behavior (like if the sample buffer wasn't allocated on the correct byte boundary). On systems where the alignment is a non-issue, compilers should replace memcpy calls with the equivalent of *(INT16*)buffer = (INT16)value; etc.

ValleyBell commented 1 year ago

Sorry for the late reply.

Thanks for catching the issues with the 24-bit stuff!

There are a few things I'd like to have changed though:

TBH I'm a bit split about the memcpy stuff.
I see on godbolt that GCC/Clang generally handle these in a very performant way with any optimization level. However, MSVC 2015..2019 requires at least /O2 to generate decent assembly code. It calls the external memcpy function with other optimization levels, causing a heavy performance drop in Debug mode. But one could argue that MSVC is know for that anyway.

I tested this against MS VC 6.0 (which I still use for 32-bit VGMPlay), and it behaves like later MSVC versions, so I guess this should be fine though.


MS Visual C++ 6.0 test:

void myfn() { int a = 2; int b = 3;

arr[3] = a;
memcpy(arr, &b, sizeof(int));

}


- compile using `cl /O2 /Famemcpy-test.asm /c memcpy-test.c`
- `memcpy-test.asm`

```asm
    TITLE   memcpy-test.c
    .386P
; [omitted segment definitions]
PUBLIC  _myfn
EXTRN   _arr:BYTE
;   COMDAT _myfn
_TEXT   SEGMENT
_myfn   PROC NEAR                   ; COMDAT
; File memcpy-test.c
; Line 10
    mov DWORD PTR _arr+12, 2
; Line 11
    mov DWORD PTR _arr, 3
; Line 12
    ret 0
_myfn   ENDP
_TEXT   ENDS
END
jprjr commented 1 year ago

Regarding keeping it a single function and conditionally-compiling code - with the ALSA driver I have the benefit of knowing it will be compiled on Linux - alsa/asoundlib.h includes endian.h -- which is a platform-specific header. It's not defined by POSIX, and even between Unix-ish platforms it differs in what macros it defines.

I can try getting together a list of macros to determine endianness and do the same in PlayerA. Main concern though is that would bring platform-specific macros into code that (as far as I can tell) is platform-independent. That's why I went with the two-function + runtime check.

ValleyBell commented 1 year ago

I wouldn't mind to just have a global #define that controls Endianess.

Apparently you can do this easily with CMake's built-in scripts:

If you want, I can do the CMake integration and leave the code fixes to you.

jprjr commented 1 year ago

That would be great!

ValleyBell commented 1 year ago

I pushed a commit 5708a21390ed955059efff8dd501644c435dd5e2 that makes CMake define the VGM_LITTLE_ENDIAN and VGM_BIG_ENDIAN variables.
I also implemented the ALSA fixes in 02bc8a64bfec599793c5d85bc9b5149e41dabe8f and gave you credit.

ValleyBell commented 11 months ago

Do you intend to touch the pointer-cast/memory stuff again at some point?

jprjr commented 11 months ago

Sorry about that - pushed up a new version that uses the defines

ValleyBell commented 11 months ago

Thanks.

Out of curiosity - what is the reason you avoid the packed INT32 data : 24; so much?

jprjr commented 10 months ago

Most properties of bitfields are implementation-defined so I generally avoid them, like most undefined behavior/properties. The only time I ever use them is when I need to sign extend from a fixed number of bits.

ValleyBell commented 10 months ago

Okay, that makes sense.