asb2m10 / dexed

DX7 FM multi plaform/multi format plugin
GNU General Public License v3.0
2.77k stars 240 forks source link

Dexed does not handle (invalid) high bytes in sysex data gracefully #194

Open eclab opened 4 years ago

eclab commented 4 years ago

There are quite a number of available sysex files online which have high bytes in the middle of their sysex data erroneously. Typically this isn't garbled data, just data with the high bit set for no reason. Dexed can't parse these correctly.

I attach one example. Consider the patch "SYNCLAVIER" (or as Dexed puts it "SY C A VE R"), whose parameters (and name) Dexed interprets hillariously incorrectly.

This is an easy fix: Dexed just needs to strip out all high bits between the F0 and F7, then parse from there.

BANK0054.SYX.zip

elovian commented 3 years ago

Looks like adding the line

c &= 0x7F; // strip don't care most-significant bit from name and modifying the compare after default: to if (c < 32) would do the trick.

    static String normalizePgmName(const char *sysexName) {
        char buffer[11];

        memcpy(buffer, sysexName, 10);

        for (int j = 0; j < 10; j++) {
            char c = (unsigned char) buffer[j];
            c &= 0x7F; // strip don't care most-significant bit from name
            switch (c) {
                case 92:
                    c = 'Y';
                    break; /* yen */
                case 126:
                    c = '>';
                    break; /* >> */
                case 127:
                    c = '<';
                    break; /* << */
                default:
                    if (c < 32)
                        c = 32;
                    break;
            }
            buffer[j] = c;
        }
        buffer[10] = 0;

        return String(buffer);
    }
asb2m10 commented 3 years ago

Thanks for the suggestion. This would be a quick win.

But the real implementation would be to find the correct charset that Yamaha used for this. Then Yen symbol was not at position 92 on the original codepage 437 it was at position 157. Once we map the right character, we might be able to see them correctly in the UI.

eclab commented 3 years ago

At least for my example, the issue was not charsets. It was high bits incorrectly being stuck in the middle of sysex data (non-status) bytes, which I've found in a number of DX7 files.

elovian commented 3 years ago

92 is the correct position for the yen character on early 8bit Japanese systems. They used a character set called JIS-8 http://ss.sguc.ac.jp/~rider/basic/jis8code.html Which is mostly but but completely similar to ASCII. And still leads to occasional odd characters when reading text files still.

I think it's that the high bit is likely Don't Care but we can wait for confirmation from someone with a DX7.

2020年10月13日(火) 22:17 Evolutionary Computation Laboratory < notifications@github.com>:

At least for my example, the issue was not charsets. It was high bits incorrectly being stuck in the middle of sysex data (non-status) bytes, which I've found in a number of DX7 files.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asb2m10/dexed/issues/194#issuecomment-708129920, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARICNLGD3ZIHELR46Y4QAFDSKUJ27ANCNFSM4JHVYQ3A .

eclab commented 3 years ago

This may all be true regarding charsets but it seems to be to be irrelevant to the bug report I had posted: sysex messages simply cannot have high bits set. Multiple parameters (not just the name) in my example are set wrong due to invalid high bits. And the name "SYNCLAVIER" doesn't contain a Yen anyway: it just has high bits incorrectly set for several of the (ASCII) characters in the name.

asb2m10 commented 3 years ago

I've just merge @elovian fix that is based on the program, e.g. only the programs loaded/saved by Dexed will have the values normalized. See nightly builds. This should fix @eclab issue.

But to do the correct implementation, we should use JIS-8 for program code page also.