davidmoreno / rtpmidid

RTP MIDI (AppleMIDI) daemon for Linux
Other
178 stars 39 forks source link

long sysex again #113

Open sadguitarius opened 4 months ago

sadguitarius commented 4 months ago

Ok so sorry for the barrage of issues, I've been meaning to log these and never got around to it. It appears that something has changed in the handling of longer sysex messages so that they are not being split into smaller messages where appropriate. So far I've run into this with the use case of a Yamaha TX-81z sending bulk dumps to MioXL -> rtpmidid on Raspberry Pi -> Windows PC, but I'd imagine this would should up for anything that sends messages longer than the ALSA maximum message size. The weird thing is that aseqdump does split bulk messages into smaller chunks but I can see that F0/F7 are not added to the chunks. I guess that means rtpmidid may be interpreting the partial messages as garbage so won't pass them along, right?

davidmoreno commented 3 months ago

Hi.

I changed the custom MIDI to alsa seq code to use the ALSA function that does that.. and it looks that for this case it fails? I have to dig deeper.

I'll let you know if I discover something.

Thanks for the report. David.

sadguitarius commented 3 months ago

I think there's a compiler option baked into the kernel that (arbitrarily) sets the maximum allowed sysex size. It's not well documented but I read about it somewhere. Would that have something to do with it?

sadguitarius commented 2 months ago

Has there been any update on this issue? Can you let me know where to look for the code that was changed to I can try rolling it back? Thanks so much.

BTW I think I was wrong and at least some of the partial messages were being forwarded to rtpMIDI on Windows, although they still weren't working properly. I just noticed some errors in the Windows rtpMIDI log about invalid messages being received from the TX-81z, but I'll have to do some more solid testing about what exactly is being forwarded. I'm guessing rtpmidid still needs to be handling the message segmentation internally for this to work, right?

sadguitarius commented 1 week ago

Hi @davidmoreno can you let me know what ALSA function you changed to so I can do some experimenting with this? I'm happy to devote some to it but it would be way easier if I knew where to start. Thanks!

davidmoreno commented 1 week ago

Hi.

The crux should be at the mididata_to_alsaevents_t class.

On previous versions I created my own converter between the ALSA seq events and the MIDI streams. Then I discovered that ALSA had their implementation, so I started using it.. but fails on long SysEx.

On rtpmidi, long SysEx are split into several packages, and my suspicion is that the ALSA converters do not manage this very well. I have some basic description at the end of https://github.com/davidmoreno/rtpmidid/blob/master/docs/RFC6295_notes.md

So ideally I think the best course of action is create some new test that try several options:

  1. long SysEx into some ALSA event structure and using only this class be able to convert to a MIDI stream and several packets. Max length for each packet should be around 1300 bytes per packet. And then it should try to convert it back into only one SysEx.
  2. Convert some split SysEx messages into the proper ALSA events.

As it looks that ALSA doe snot work, somehow we should detect and manage SysEx specially. Maybe internally it should use a buffer where the temporal data is set and used .. maybe it can be on the heap with a 16MB limit or something like that?

All this is just ramblings. If you decide to give it a go tell me. If not I will try next week or so.. I also need it.

sadguitarius commented 1 week ago

Ok great, I'll have a look. I have a couple of pieces of gear that reliably fail with rtpmidid and work with Windows RTPMIDI so I should be able to run some good tests. Thanks!

sadguitarius commented 1 week ago

A few notes so far:

snd_midi_event_encode and snd_midi_event_decode are relatively simple classes. Would it make sense to just write new but similar methods that inject the segmentation logic and point to an external large buffer?

The first point of failure with sending long sysex is just that snd_midi_event_decode is running out of memory due to the rtpmidid::io_bytes_static<1024> data declarations, which you probably knew already, but increasing that buffer size causes the sending of long sysex messages to fail silently, so I'll keep moving up the chain. It looks like ALSA expects you to manage larger buffers externally right? but then I don't understand why making a custom method to wrap all of that wouldn't be a better idea.

In the PipeWire library, the alsaseq.c uses the method snd_seq_client_pool_alloca to manage the maximum sysex message size, but I still don't know if there's a hard limit imposed by ALSA at some point.

Also, I've been using sendmidi and receivemidi, which are based on JUCE, to send sysex files and found that they also choke on very large sysex files with the limit more in the thousands of bytes range, but I think this actually a limitation of JUCE arbitrarily deciding on a maximum size. JUCE does have some MIDI segmentation logic built in, which I've been looking at as well.

sadguitarius commented 1 week ago

Ok actually RtMidi looks like it does a good job with this. Only issue is the malloc in the realtime path, but I guess if the buffer is set big enough to begin with it really shouldn't be a problem. edit I'm not necessarily suggesting to use the RtMidi library, but it looks good for learning some ALSA MIDI tricks

sadguitarius commented 6 days ago

So I thought I'd at least send what I have so far, even though it is an absolute mess. It looks like long sysex is being received to rtpmidid from Windows but I'm still having trouble with sending from Linux. I'm sure it's something obvious and I'm still figuring out how all the pieces of rtpmidid fit together, hence the million logging statements I've added. Would you mind taking a look? Let me know if I'm doing something totally stupid or this isn't the direction you want to be going. Obviously, some of the new code is redundant and could be a lot cleaner. I have a few changes to the build process in here to for my benefit, but hopefully they won't break anything on your end.

here's my working branch

Also, I think one of the things that was getting in the way was that the snd_seq_midi module by default was not setting its internal buffer size high enough. I put

options snd_seq_midi input_buffer_size=131072
options snd_seq_midi output_buffer_size=131072

in a .conf file in /etc/modprobe.d and at least at some point it looked like it was making a difference.

davidmoreno commented 4 days ago

Hi,

finally I could make some review.

Interesting about input and output sizes, but I dont think they are so important.. maybe they are.

I see a lot of unrelated changes like s/event/ev/ and so... but anyway I think the main point is that if the data from rtpmidi is bigger than the allocated buffer, you create a temporary one (with malloc) and use that one. Is that right?

I would do something different, just modify at the mididata_to_evs_f to parse better the sysex that quite probably is split into several pieces.

I will try to make some tests myself... but to use windows I will have to compile on a rpi4... and that's slow! so it may take me some time.

Another option, that I'm delaying but should be done, is give the possiblity of sidestepping totally the alsa sequencer interface, and create a direct to midi (as in /dev/midi) midipeer_t. It should not be difficult, but I have to think about the interface... as we can not use anymore the sequencer interface to create the connections. One option is be controlled purely by the ini file. I will explore this option too.

davidmoreno commented 4 days ago

Looking more into you code I see you do some event size rewrite. Try setting not exactly the size you think it is (the sysex length maybe?) but some more bytes for headers and whatever. Even set it to double. It should never be too big. But maybe check less than 16MB or something like that.

davidmoreno commented 4 days ago

Also I see I had a snd_midi_event_reset_encode.. maybe thats destroying it all? Somehow it does not feel right to me. I think maybe is needed if several rtpmidi clients connect to the same peer. But you can try to remove it, and connec tonly one client and if it's that.. then we need a mididata_to_alsaevents_t per connected client.

sadguitarius commented 4 days ago

Ah I was just in the middle of writing another message!

Yeah honestly you don't need to pay attention to my code at this point. I've mostly filled it up with a bunch of logging stuff to try to figure out what's happening to the messages, and the malloc thing isn't making much of a difference so far. No matter what I do there's still a bottleneck somewhere. I've mostly come to the same conclusion as you about piecing the segments back together in mididata_to_evs_f, although I believe there still needs to be some logic to manage the size of the intermediate MIDI event buffer no matter what, right? At some point I did have the snd_midi_event_reset_encode taken out, but I'm not sure if it made a difference.

I then tried a basic sanity test with aseqsend and aseqdump and no rtpmidid involved and found that the ALSA seq situation is just generally unreliable with sysex sizes over ~4096. I then tried using snd_virmidi and had no problem sending large messages. Would a possible solution be to add a config.ini option in rtpmidid to create a specified number of virtual raw MIDI ports to use and which network ports they would correspond to or something? They could then still be routed using the standard ALSA interface, and at least in my case, I'd only need a couple of raw MIDI ports in addition to the standard seq ports. That way it would also be possible to use the standard Linux command-line utilities for loading and sending Sysex files and they would work as intended.

Also, try the distcc thing for remote builds on the Pi if you have the patience to get it set up! I have it so that my PC is running a distcc client in WSL and all of the heavy CPU load is happening there, so it takes maybe 30 seconds or so to do a clean build.

sadguitarius commented 3 days ago

Is it possible that there is a bug in snd_seq_output_direct?? I've tracked things down a bit with my dumb fork with mallocs everywhere, and while it it totally not the best way to get things done, what I at least have is that the integrity of messages up to a certain point is maintained right up until snd_seq_output_direct, but there is a point where the message gets too large and the function gives me a "Cannot allocate memory" error.

That's totally weird, because it really shouldn't depend on any of the main ALSA buffers, right? Looking at the alsa-lib code, it looks like it's just doing a one-off temporary buffer for longer messages as needed. I might have to try doing my own build of alsa-lib so I can trace what's going on, but it's curious that the official word has been "don't use ALSA seq for long messages, it's unreliable" yet if it's possible to manually set large buffer sizes, why would there be an arbitrary point where it just doesn't work?

sadguitarius commented 2 days ago

Ok no it's actually failing at line 388 of seq_hw.c in alsa-lib, ssize_t result = write(hw->fd, buf, len); with a memory error so I guess it's an issue with the kernel module? Unless there's some other buffer setting I'm missing.

sadguitarius commented 1 day ago

Progress! Maybe... I found this code in the PipeWire source that sets attributes for the input and output snd_seq_client_pool_t along with a nice explanation here that doesn't exist even in the alsa-lib docs. The default size setting of 200 allows for a maximum message size of about 5600 bytes, which conforms with the results I was getting from tests. The kernel allows a maximum size of 2000, which I guess would allow for a 56000 byte message. More than that and you'd need to compile a custom kernel, since the maximum defined by SNDRV_SEQ_MAX_CLIENT_EVENTS is hard coded and not even available as a kernel config parameter. I still need to figure out if there's another bottleneck somewhere, but this does seem like a step in the right direction at least.