PhilippeBekaert / snd-hdspe

New linux driver and tools for RME HDSPe sound cards and extension modules
GNU General Public License v3.0
51 stars 18 forks source link

Can I set Pitch (PPM) programatically? #18

Open alex-vyverman opened 9 months ago

alex-vyverman commented 9 months ago

Hello,

Does anyone know if the Pitch (PPM) parameter can be set programatically from commandline? The reason is that each time my PC wakes from sleep, the pitch is set to -35596. This causes my application (CamillaDSP) to be unable to start up. Setting the pitch manually to zero (Or close to), fixes this, and the application can start.

Screenshot 2023-12-13 at 11 00 16

So either I need to know why it starts up at this value and find a fix for it, or I need to create a script that changes this value on wake/startup.

Anyone out there that could help with this? Many many thanks

jimfrench commented 9 months ago

Hi, At first glance it looks like the sample rate is being unset for some reason when during sleep, maybe an application which is setting the sample rate is no longer loaded during wake up from sleep.,

The pitch shows the deviation from the current sample rate and the nearest standard rate.

What is the sample rate set to prior to sleep? You can check command line e.g.,

cat /proc/asound/card0/pcm0p/sub0/hw_params | grep rate

Then after wake do the above command again to see if there is a change.

You might need to keep some application loaded that will keep the sample rate at 96kHz (JACK?)

The general behaviour of a system and its apps etc. during sleep is quite variable, but it looks like it's the sample rate which needs to be kept under control during sleep/wake cycle by some method or another , depending on what you've got happening.

Schroedingers-Cat commented 9 months ago

On my HDSPe MADI, I noticed a similar behavior. After standby, the MADI connection is unable to resume and stays broken. This can only be fixed by restarting JACK.

This bug is also present in snd-hdspm, the predecessor of this driver.

jimfrench commented 9 months ago

Interesting that it's a known bug, I haven't noticed the behaviour due to tending not to use standby at all.

I suppose a "fix" then would be to script a restart of JACK after wake-up. It seems odd to me that there is no persistence of the driver parameters.

Schroedingers-Cat commented 9 months ago

I don't remember exactly what I tried but I tried a lot around the idea of restarting Jack after standby but never got the fix to work.

Schroedingers-Cat commented 9 months ago

Just found out that I can use hdspeconf to fix this state after standby without having to stop the entire JACK server. All I have to do is temporarily change the sync source to a different value and then set it back to the original value. That'll fix it. @alex-vyverman the state of your hdspeconf looks exactly like over here (two yellow exclamation mark signs in the top right) so you might try if that also helps in your situation.

Schroedingers-Cat commented 9 months ago

Hm, while changing the clock values fixes the driver's state (it produces a valid MADI stream after changing the clock source), the JACK server still seems to be in a weird state where no client is able to send samples to JACK.

alex-vyverman commented 9 months ago

Yes, the behavior is very strange. What I actually want to achieve, is to slave to AES1. As you can see in the screenshot, I have a valid lock there coming from my Mac. So my first instinct was to use alsactrl to switch to external sync on startup.

But when I do so, or do it manually, after sleep PPM goes to near 0, but my application will not start. It shows a resource busy error when trying to initialise alsa.

The remedy is to stay in internal sync, slide the PPM slider to near zero, start the application (It will start at that point), and then switch to external sync.

I have no clue where to start debugging this, but if any of you can help that would be immensly appreciated.

Alex

alex-vyverman commented 9 months ago

@jimfrench This is the output after wake:

access: MMAP_NONINTERLEAVED
format: S32_LE
subformat: STD
channels: 16
rate: 96000 (96000/1)
period_size: 256
buffer_size: 512

Seems to be unchanged fron what it was before

edit: I was wrong, after a wake I get: alex@dsp-ubuntu:~$ cat /proc/asound/card0/pcm0p/sub0/hw_params closed

Which would explain why in that state, it's impossible to get the device to open, no?

jimfrench commented 9 months ago

Yes, this is starting to make some sense especially alongside @Schroedingers-Cat comments.

Unfortunately, at present I have difficulty reproducing, my system does all sorts of unrelated horrible things during suspend.

Also I don't use hdspeconf, due to experiencing issue https://github.com/PhilippeBekaert/hdspeconf/issues/2#issue-1118573293

I use amixer -c0 cset ... commands instead.

The findings by @Schroedingers-Cat about changing the clock source seem to relate with the findings in the above issue during attempted fix.

There is also some interesting information here about some oddities in general. Especially with regard to needing to set the internal clock manually so ALSA and JACK pick up on that, if the external clock is above 48kHz.

https://www.jrigg.co.uk/linuxaudio/hdspe-madi.html

If I were trying to debug I would perhaps simplify and rule out known issues to be on the safe side, set the clock to 48kHz. Then I would use amixer commands so as to not rely on hdspeconf, to find if the bug is in the driver, by for instance trial and error with the asound.state in general, testing different configurations via the store and restore commands before and after suspend.

If I can get this machine suspending I'll try to do some testing.

jimfrench commented 9 months ago

@alex-vyverman

I managed to get suspend working unfortunately this is looking like a bug in the snd-hdspe driver specifically

[ 1105.877036] WARNING: CPU: 0 PID: 5159 at /var/lib/dkms/alsa-hdspe/0.0/build/sound/pci/hdsp/hdspe/hdspe_pcm.c:546 snd_hdspe_trigger+0x131/0x1f0 [snd_hdspe]

After suspend, a restart of JACK seems to restore basic functionality but much is broken, some settings are lost, etc.

The driver hardware parameters do persist on my machine, but there are clearly multiple issues here to resolve, to make everything ok after suspend, probably not just one issue.

EDIT: Additionally the way JACK fails before restart is logged as ERROR: ALSA: poll time out, polled for 63999148 usecs, Retrying with a recovery, retry cnt = 1

jimfrench commented 9 months ago

A further look at the hardware parameters show the state is not being saved at the driver , i.e., basically suspend / resume is not yet implemented at the driver level. A quick look at the source code confirms this.

The hardware parameters can be shown to not persist e.g. this is a diff between the two attached files, one before suspend and the other after - there would be more differences if there was more than default behaviour happening (e.g., sync sources, and suchlike).

This is where hdspeconf is getting the erroneous sample rate from among other things

diff hdspe.0 hdspe.1
4c4
< System sample rate    : 96000
---
> System sample rate    : 92582
37,39c37,39
< BUF_PTR   : 49152
< BUF_ID    : 1
< ID_PTR    : 16384
---
> BUF_PTR   : 00000
> BUF_ID    : 0
> ID_PTR    : 00000

There could be a way to circumvent by setting its power management flags manually but initial testing shows the driver is currently ignoring the pm_runtime_forbid(); e.g.

sudo sh -c "echo on > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/sound/card0/power/control"

makes no difference to the behaviour

So I think a fix would be needed at the driver level to at least respect the power control setting, and possibly then avoid the bug, even if code for fully saving and restoring the driver state wasn't present.

hdspe.0.txt hdspe.1.txt

alex-vyverman commented 9 months ago

Thanks for all the very valuable input so far.

SInce I don't posses the skills to fix this issue in the driver, I am resorting to a very lame but effective fix:

reload_hdspe.sh:

#!/bin/bash
/usr/sbin/rmmod snd_hdspe
/usr/bin/sleep 1
/usr/sbin/modprobe snd_hdspe

and

hdspe_suspend.service:

[Unit]
Description=reload HDSPe Driver
After=suspend.target hibernate.target hybrid-sleep.target suspend-then-hibernate.target

[Service]
ExecStart=/home/user/reload_hdspe.sh

[Install]
WantedBy=suspend.target hibernate.target hybrid-sleep.target suspend-then-hibernate.target

This way the driver gets reloaded after suspend, and it seems to work out fine so far.

Alex

jimfrench commented 9 months ago

Well done, if this works for you then that's a great improvement. I wouldn't call it lame, perhaps sledgehammer.

It is a lot cleaner than the current behaviour because removing the module before suspend prevents the execution of the actual trapped bug at hdspe_pcm.c:546

If the hardware parameters persist between module reload then that's bonus behaviour. If not, you could maybe script that to reload too, it would be in general a good workaround

Writing the suspend implementation into the driver is an amount of work and was not considered high priority when the driver was initially written, it was on the TODO list ...

Schroedingers-Cat commented 9 months ago

I tried to implement the suspend/resume functions over here: https://github.com/Schroedingers-Cat/snd-hdspe/tree/feature/suspend-resume

What works is bringing the card back into the state before suspend. It'll resume with the correct settings and I can see that the outgoing MADI stream has the correct properties. This should work for all HDSPe cards, not just the MADI card.
What doesn't work yet is getting rid of the requirement to restart JACK, otherwise no audio will actually reach the card after suspend. I checked that the interrupts are still being called and also tried various things that are done during the initialization phase of the driver. Also setting the SNDRV_PCM_INFO_RESUME flag didn't help.

Maybe somebody else has an idea of what is missing? The changes can be viewed here: https://github.com/Schroedingers-Cat/snd-hdspe/commit/e098c25759afe804c3c2cb0e4075f348f1876e11

jimfrench commented 9 months ago

@Schroedingers-Cat this is excellent, I'll checkout your changes and report back test results...

jimfrench commented 9 months ago

@Schroedingers-Cat

I have been testing this, unfortunately time is against me but it looks like there is an issue which prevents the system resuming at all after your changes are applied.

I have double-checked both by reverting and also using a live boot Debian 12 to try to rule out any local configuration blame.

Both times and in both configurations, the system does not seem to resume with your changes applied (or fails to suspend cleanly?)

I understand the code you have written alongside the ALSA docs, but am unsure how to approach debugging why this is causing this behaviour, could you provide some guidance for debugging please and I might be able to help further?

EDIT: I've in general been attempting without much success: https://wiki.ubuntu.com/DebuggingKernelSuspend

Schroedingers-Cat commented 9 months ago

@jimfrench thanks a lot for looking into this! Maybe whatever is happening on your system can point us to what's missing for picking up the audio after resume.

I've pushed a new commit to my fork repo, which adds some additional debug logs: https://github.com/Schroedingers-Cat/snd-hdspe/commit/a526f9449cd1bacdf0da6219f94663aa934714fe

You can do a checkout, run sudo make uninstall && sudo make install in the repo's root directory and then reboot. After that, you should be able to see more logs from the driver via sudo dmesg | grep snd_hdspe, like the interrupts increasing. Does that work?

How does the failed resume manifest? Are you able to switch to a terminal via CTRL+ALT+F-Keys and run sudo dmesg | grep snd_hdspe in there?

Also, what HDSPe card are you using?

jimfrench commented 9 months ago

@Schroedingers-Cat - No problem, hopefully this info will help. I don't have a much experience debugging suspend/resume issues in general but from what I've seen it can be quite complex.

For reference first here is the out of sudo journalctl -b | grep snd_hdspe before your suspend-resume implementation, can see the initial errors etc: before.log

Here is a summary of the 'successful' suspend and resume in the above case (i.e. successful for the system) success.log

After your latest commits for comparison, no errors now: after.log

And here is how the suspend manifests, no console and require hard reset by holding down power switch: Output of previous boot i.e. sudo journalctl -b-1 | grep suspend fail.log

The log of the previous boot doesn't show any snd-hdspe specific errors - they happen after the console is lost (?)

Here is what was giving me a headache earlier today by following the debug guidelines (sorry this log isn't parsed): pm-suspend.log

The behaviour can be reproduced on a clean system. The card is AES...

Schroedingers-Cat commented 9 months ago

@jimfrench it seems that the driver was having a problem even before the fix (see lines 210 and 215 in your before.log). I'm not sure what that could cause, though. Maybe it doesn't matter.

The after.log does not seem to contain any interrupt message. Did you remove those? For every 1000 interrupts, there should be a message. If that is missing, something weird is going on already before your system even enters suspend.

Nothing interesting in fail.log. Maybe try creating a systemd file /etc/systemd/system/capture_dmesg.service with the following content:

[Unit]
Description=Capture dmesg before suspend

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c 'dmesg > /var/log/before_suspend_log.txt'

[Install]
WantedBy=sleep.target

Then sudo systemctl enable capture_dmesg.service, then restart your system. After that, enter suspend by sudo systemctl suspend or graphically. Next time your system is up (after resume or a hard reset), there should be a file /var/log/before_suspend_log.txt with the output of dmesg right before the system entered suspend.

jimfrench commented 9 months ago

@Schroedingers-Cat

it seems that the driver was having a problem even before the fix (see lines 210 and 215 in your before.log). I'm not sure what that could cause, though. Maybe it doesn't matter.

This is to be expected - the first log shows the unfixed errors I mentioned in an earlier commend in this issue, during a suspend/resume attempt, in this case after reverting to your support-v5.18 branch, as a starting point before testing. These two errors are fixed in your suspend-resume branch due to now being handled by SNDRV_PCM_TRIGGER_SUSPEND

The after.log does not seem to contain any interrupt message. Did you remove those? For every 1000 interrupts, there should be a message. If that is missing, something weird is going on already before your system even enters suspend.

Actually I was too quick on the buzzer here, I didn't give the driver a chance to give that output before saving the log. Waiting for a minute gives the expected interrupt: before-suspend.log

Nothing interesting in fail.log. Maybe try creating a systemd file /etc/systemd/system/capture_dmesg.service with the following content:

[Unit]
Description=Capture dmesg before suspend

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c 'dmesg > /var/log/before_suspend_log.txt'

[Install]
WantedBy=sleep.target

Then sudo systemctl enable capture_dmesg.service, then restart your system. After that, enter suspend by sudo systemctl suspend or graphically. Next time your system is up (after resume or a hard reset), there should be a file /var/log/before_suspend_log.txt with the output of dmesg right before the system entered suspend.

Unfortunately this is exactly why I was having trouble debugging this end, there are no debug messages of interest. A slightly extended tail of the dmesg log before failed suspend shows the driver interrupts again, but this is way before the suspend is called: before-suspend-tail.log

The sleep.target is not reached in suspend-resume branch and no log is created.

Reverting to support-v5.18 once again brings the system back to be able to suspend and the sleep.target is reached and the extra log created with its contents as to be expected in that previous state as previous logs show...

EDIT: Sorry sleep.target is reached in both cases but no driver debug messages after

jimfrench commented 9 months ago

@Schroedingers-Cat

It looks like a suspend is not triggering the callback function at all and therefore SNDRV_PCM_TRIGGER_SUSPEND is never reached.

I tried putting more debug messages in the code and they don't appear, I increased the frequency of the interrupt logging too but as you can see here there is no Trigger received with 5 logged.

I also tried setting the SNDRV_PCM_INFO_RESUME flag in .info , but this is only relevant during resume, and I'm not getting there yet.

before-suspend.2.log.txt

jimfrench commented 9 months ago

@Schroedingers-Cat

I have tracked this down to the ALSA power state command, it is this that locks my machine during the suspend. snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);

I achieved this by stepping through the code in snd_hdspe_suspend()

It appears that the console doesn't get a chance to output debug messages that are printed just before this function is called and fails, hence confusion about where it was failing,

Schroedingers-Cat commented 9 months ago

@jimfrench interesting findings!

I have tracked this down to the ALSA power state command, it is this that locks my machine during the suspend. snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);

What happens if you use one of the other states from here? Maybe we need card-specific power states.

Also, is your AES on the latest firmware version?

I achieved this by stepping through the code in snd_hdspe_suspend()

You mean like stepping through the code with a debugger while the module is actively running in your kernel? How are you doing that?

EDIT: I was unable to find official documentation of the function snd_power_change_state - anybody has an idea where I could find that?

jimfrench commented 9 months ago

@Schroedingers-Cat

Yes it's interesting, I'll try to do some more testing tomorrow, it's getting late this side of the Atlantic :)

I'm not using a debugger, by stepping through the code I mean manually entered a few breakpoints in the function and made several tests locally.

I did quite a lot of C programming years ago... but I'm not really setup for development workflow nowdays. But can try and help. This is currently the only project I'm actively following / commenting etc on github due to using the out-of-tree kernel module (your support branch)

I'll try your suggestions, I can probably also try the card in a different machine quite easily.

jimfrench commented 8 months ago

The system suspends and resumes with SNDRV_CTL_POWER_D1 but not SNDRV_CTL_POWER_D2 or above.

I have been trying to ignore this for the minute by not issuing the power state commands i.e, comment out both the snd_power_change_state() calls in snd_hdspe_suspend() and also its counterpart in snd_hdspe_resume()

This way, the behaviour of the rest of the code can be looked at.

If I had more time I would maybe try to ensure clean operation of the work during resume , as currently it doesn't seem to like it very much, disregarding JACK etc. and simply using aplay the stream fails to resume, aplay reports failure to resume, but reports successful restart of the stream, but this hardware does not actually restart the stream until a new stream is started manually. I think this means ALSA is seeing unexpected behaviour in the driver.

It makes sense that the stream is unable to resume, because the code is not supporting that yet, but it doesn't make sense that the stream is unable to restart?

Unfortunately I've run out of time on this for a few days, but maybe try watching the behaviour of aplay verbosely during a suspend/resume cycle, but without the power management getting in the way yet?

Schroedingers-Cat commented 8 months ago

Thanks a lot, will look into this soon.

Did you check what firmware version your AES card has flashed and if that is the latest version?

jimfrench commented 8 months ago

Actually good point the firmware isn't up to date - I'm going to have to do that somehow, I don't have Windows installed on the machine the card is in to run the RME flash tool .

Schroedingers-Cat commented 8 months ago

@jimfrench my branch now sets the suspend power state per model meaning suspend should work based on what you found out!

If I had more time I would maybe try to ensure clean operation of the work during resume , as currently it doesn't seem to like it very much, disregarding JACK etc. and simply using aplay the stream fails to resume, aplay reports failure to resume, but reports successful restart of the stream, but this hardware does not actually restart the stream until a new stream is started manually. I think this means ALSA is seeing unexpected behaviour in the driver.

It makes sense that the stream is unable to resume, because the code is not supporting that yet, but it doesn't make sense that the stream is unable to restart?

Also took a look at this one. The function snd_hdspe_trigger should handle the resume now in a very simple manner (it's like starting a stream). To my understanding, that should at least work since everything in there seems to be based on interrupts. But even with aplay, it's still failing to continue with actual sound after resume. I've also verified that the actual resume command is received and the function reached the _ok part.

It seems to me that another function needs to be called before this can work, like snd_hdspe_hw_params. At least, I see that function's output in dmesg when stopping and starting aplay again after resume. However, that function seems to be called on behalf of ALSA.

jimfrench commented 8 months ago

@Schroedingers-Cat excellent ...

I've been updating the firmware and testing the card in a different motherboard. Actually, I had to put the card in a different motherboard for the RME flash tool to even work - it would attempt and then crash. I tested this in a couple of different Windows OS and eventually swapped the card into another, much older motherboard and it flashed it straight away without errors. I'm not sure if there is a bug in the RME tool. It sure seemed a lot more difficult than it ought to have been.

This has coincided with me wanting to change systems anyway for other reasons, so currently doing that before any more testing can resume. With the card now with the updated firmware, and in a different motherboard, I'm expecting to see different behaviour.,

Unfortunately if testing now differs from before we may never know whether it was the firmware or the motherboard, or both, without rolling back and that would not be fun!

I'll get this new config up and running and see how it goes next week - look forward to working on this again soon...

jimfrench commented 8 months ago

Unfortunately if testing now differs

Testing doesn't differ ... which at least makes the workflow sane this end. I'm back to the previous behaviour with up-to-date firmware and in a different mb.

I'll check out your latest changes in more detail, but I think this is progress, because we can rule out both firmware and mb

Schroedingers-Cat commented 8 months ago

I'm not sure if there is a bug in the RME tool. It sure seemed a lot more difficult than it ought to have been.

I had the HDSP 9652, the HDSPe AiO and the HDSPe MADI across something like nearly 20 and flashing the firmware was always simple. Great that you had another MB around!

Testing doesn't differ ... which at least makes the workflow sane this end. I'm back to the previous behaviour with up-to-date firmware and in a different mb.

Reading the changelog from the RME HDSPe firmware, I really was expecting a different result:

V 2.31:

  • HDSPe firmware updates: AIO V 13, RayDAT V 13, AES V 9, MADI V 30, PCIe Power Management changed

Looking forward to hear back from your test results with the latest changes!

jimfrench commented 8 months ago

Yes I was expecting different results too which is why I put a lot of effort into updating the firmware. The issue was definitely something to do with the motherboard (IBASE MB870VF) which is quite an odd board, but very reliable industrial grade. The reason I bought it many years ago was due to the ISA slot which at the time of release was unheard of on a 'new' board. I previously needed the ISA for old automation hardware, working in an electronics lab.

I no longer need the ISA ... therefore just about to upgrade boards to something better than the one I had lying around.

jimfrench commented 8 months ago

@Schroedingers-Cat

Unfortunately the 'new' system here is acting the same as described previously with your most recent commits.

It must be hard guessing what this card is doing from that end even with my testing. I've looked at your edits and they make sense, but I think there are other differences that need to be taken into consideration with this AES card.

I may if time allows look more closely at the code and do some actual experimenting this end.

Feel free to throw more tests at me in the meantime if you want to.

Schroedingers-Cat commented 8 months ago

Feel free to throw more tests at me in the meantime if you want to.

One particular test would be to use the latest commits but change the model dependent power state to the hardcoded power state that worked for you before:

The system suspends and resumes with SNDRV_CTL_POWER_D1

jimfrench commented 8 months ago

With the older motherboard I'm temporarily using, SNDRV_CTL_POWER_D1 now causes suspend to fail.

With the previous motherboard SNDRV_CTL_POWER_D2 was required to cause suspend to fail.

Commenting out the function snd_power_change_state() causes the system to suspend and resume cleanly same as previously.

jimfrench commented 8 months ago

@Schroedingers-Cat

I put the card back in the original motherboard to double check. The behaviour is now consistent between boards.

This means the firmware update did do something to do with the power management, but unfortunately not in the way we hoped for.

SNDRV_CTL_POWER_D1 and higher now always causes suspend to fail regardless of mb/system.

Essentially this means that any call to snd_power_change_state() now fails .

I think a sensible approach might be to investigate the cards' behaviour on Windows during suspend, and find out what RME are expecting it to do with their Windows driver. This would give a starting point to what to aim for and also see if there is a different between AES and MADI cards. Also, @Sotem123 mentioned that they have two other cards in the series (HDSPe AIO (not PRO) and the HDSPe RAYDAT ) and they are running Windows and could perhaps report the suspend behaviour for those cards too.

That would give a supported feature list for each card and expected general behaviour also, so we are not guessing with what is supported with each card.

I'll try to do that in Windows here with the AES and report back next week.

Schroedingers-Cat commented 8 months ago

@jimfrench

This means the firmware update did do something to do with the power management, but unfortunately not in the way we hoped for.

SNDRV_CTL_POWER_D1 and higher now always causes suspend to fail regardless of mb/system.

Essentially this means that any call to snd_power_change_state() now fails .

I wonder why RME removed support for that. Anyway, the latest changes on my branch should be able to handle this (specifically https://github.com/Schroedingers-Cat/snd-hdspe/commit/f34cc65b43fdd9af48338197d3ac8c41f94071db).

I think a sensible approach might be to investigate the cards' behaviour on Windows during suspend, and find out what RME are expecting it to do with their Windows driver. This would give a starting point to what to aim for and also see if there is a different between AES and MADI cards. Also, @Sotem123 mentioned that they have two other cards in the series (HDSPe AIO (not PRO) and the HDSPe RAYDAT ) and they are running Windows and could perhaps report the suspend behaviour for those cards too.

That would give a supported feature list for each card and expected general behaviour also, so we are not guessing with what is supported with each card.

Sounds good but I have no idea how to dig up that info since the Win driver is proprietary. What would you do to get that kind of info?

I experimented with the suspend and resume functions a bit more to find the missing bit that makes sound actually reach the cards but still nothing that caught my eye would make it work. If you have any idea what could be missing or what could be used to get more info let me know.

I'll also have no access to my MADI card over the next week.

jimfrench commented 8 months ago

@Schroedingers-Cat

Thanks, I'll checkout your changes, I'll give it a break for the next week or so too and then pick it up.

I use git locally often and have just forked your branches on GitHub. I will need to familiarise myself with using GitHub a bit more. Please let me know if I screw up or can optimise future workflow.

The Win driver is proprietary but the separate cards' behaviour can be looked at in detail from userspace in Windows, so as to know what PM features to expect the firmware to be capable of, in turn helping reproduce that same control flow.

Examples of the Windows testing I had in mind would be simple things like monitoring if the card changed power states; whether AES sync is lost; whether audio streams are resumed/restarted and suchlike.

I don't often use Windows at all so unsure if any more information can be gained. I've haven't previously used the RME card in Windows other than the flashing experience but seems a good idea to watch the behaviour from it.

Have a good break....

Sotem123 commented 8 months ago

Hi y'all!

I'll be busy coming week as well. Afterwards I can check the cards behaviour in Windows, I'll see that I'll gather as much as possible information.

Currently I'm running the cards together in the same pc, and also synced together using the internal jumper wire. So that may be an interesting setup to look at.

Happy holidays!


From: Jim French @.> Sent: Sunday, December 24, 2023 10:57:27 AM To: PhilippeBekaert/snd-hdspe @.> Cc: Sotem123 @.>; Mention @.> Subject: Re: [PhilippeBekaert/snd-hdspe] Can I set Pitch (PPM) programatically? (Issue #18)

@Schroedingers-Cathttps://github.com/Schroedingers-Cat

Thanks, I'll chechout your changes, I'll give it a break for the next week or so too and then pick it up.

I use git locally often and have just forked your branches on github. I will need to familiarise myself with using github a bit more. Please let me know if I screw up or can optimise future workflow.

The Win driver is proprietary but the separate cards' behaviour can be looked at in detail from userspace in Windows, so as to know what PM features to expect the firmware to be capable of, in turn helping reproduce that same control flow.

Examples of the Windows testing I had in mind would be simple things like monitoring if the card changed power states; whether AES sync is lost; whether audio streams are resumed/restarted and suchlike.

I don't often use Windows at all so unsure if anymore information can be gained. I've haven't previously used the RME card in Windows other than the flashing experience but seems a good idea to watch the behaviour from it.

Have a good break....

— Reply to this email directly, view it on GitHubhttps://github.com/PhilippeBekaert/snd-hdspe/issues/18#issuecomment-1868478211, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BE2RS4MIOQQCZY3DKMN3BJTYK74APAVCNFSM6AAAAABAS4TOY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGQ3TQMRRGE. You are receiving this because you were mentioned.Message ID: @.***>

jimfrench commented 8 months ago

@Sotem123 great thanks. I may set out some individual tests but in the meantime any descriptions of your two cards general behaviour during suspend would be useful information. If you feel confident to flash the firmware to the latest versions on each card, if they are not already, that would also be good, so we are working with the latest changes.

After that if you still feel up for helping more, testing the changes being made to this repo on a Linux distro with your two cards would be really helpful going forward so we can ensure possible future compatibility in general.

Thanks!

Sotem123 commented 8 months ago

Hello, I hope you’ve had a blessed Christmas! And also, a happy New Year to you!

I’ve been a bit busy during the holidays, and besides this I’ve been busy learning Arch-Linux and having my System running the way I like to. Aside from a bit of Ubuntu and Mint Linux it is really new to me.

Both cards I have already have the latest firmware flashed on them. Today I’ll do some testing. Based on previous mails I’ll try to see if I can replicate suspend behaviour on Windows. I’ll test the following:

First, I’ll look for the behaviour on Windows. Perhaps when I have got the time I can investigate Linux afterwards, though I’m not 100% content yet with the configuration. Even though I’ve configured everything to realtime and have got pipewire to work without crashing, the latency is still a fair amount (10ms) higher compared to Windows using the same buffer-size. Pipewire also tends to “reset” the HDSPmixer each time Wireplumber is restarted, or a connection is established again.

A little bit of extra information which may be helpful:

Let me know if there are more things to test.

Regards!

From: Jim French @.> Sent: Sunday, 24 December 2023 14:13 To: PhilippeBekaert/snd-hdspe @.> Cc: Sotem123 @.>; Mention @.> Subject: Re: [PhilippeBekaert/snd-hdspe] Can I set Pitch (PPM) programatically? (Issue #18)

@Sotem123https://github.com/Sotem123 great thanks. I may set out some individual tests but in the meantime any descriptions of your two cards general behaviour during suspend would be useful information. If you feel confident to flash the firmware to the latest versions on each card, if they are not already, that would also be good, so we are working with the latest changes.

After that if you still feel up for helping more, testing the changes being made to this repo on a Linux distro with your two cards would be really helpful going forward so we can ensure possible future compatibility in general.

Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/PhilippeBekaert/snd-hdspe/issues/18#issuecomment-1868514390, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BE2RS4NOHNKHWU7Q6MTBIDDYLAS5FAVCNFSM6AAAAABAS4TOY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGUYTIMZZGA. You are receiving this because you were mentioned.Message ID: @.**@.>>

Sotem123 commented 8 months ago

Result from my testing in Windows: When entering in suspended state (putting the computer in Sleep mode), ADAT based sync stops (AES and internal sync also show "sync" in the control panel when resuming operation). Then when activating the computer again, the lock is established again. Also I had a .WAV file playing while entering sleep mode, and afterwards it started playing again.

Another mention which may be of future interest, I also own the RME ARC USB controller.

jimfrench commented 8 months ago

Hi @Sotem123

Hope you had a good /productive holiday and are making progress with your configs.

In a nutshell, the main problem we are currently having in Linux is that the audio becomes "disconnected" from the card after a suspend/resume cycle. This happens whether there is a stream or not.

For instance, starting a new stream by playing an audio file in the most basic way on linux (aplay command) doesn't work after resume. We are trying to find out why. This is before any higher level sound servers like JACK or pipewire get involved.

The driver currently needs some kind of "reset" or callback from ALSA to begin the work again, as far as we can tell.

I am currently disregarding the power management states because they are getting the way of testing the resume behaviour. I'm not this power management should be called at all , research suggests it only exists for historical reasons and that because the device is PCIe, then the D0 and D3hot states should be compliant regardless of model.

So the first testing in Windows would perhaps to simply be to do a suspend/resume and then see if audio behaves normally afterwards, whether it works at all, or if there seems to be anything different afterwards.

I have managed to do some work on my fork of @Schroedingers-Cat suspend-resume branch here https://github.com/Schroedingers-Cat/snd-hdspe/compare/feature/suspend-resume...jimfrench:snd-hdspe:feature/suspend-resume

Adding hdspe_write_pll_freq(hdspe); seems to resolve the issue right at the beginning of this thread whereby the system sample rate was undefined, now it returns see enclosed logs.

I've just read you updates about Windows seamlessly restarting the streams and this is very useful info thanks

My other commits are work in progress I have been fixing minor things to practice get back into development workflow.

I have been studying the driver in detail but work will be slow, I'm quite out of practice. But I do think I've done something useful by keeping the sample rate defined which was the cause of the initial issue?

system_sample_rate_test_0.log.txt system_sample_rate_test_1.log.txt

Sotem123 commented 8 months ago

For now I think most of my use will be for testing. I'd love to also dual-boot a special testing distro as not to corrupt my now working environment.

For this it may be good to also install Arch-Linux as it is the distro I use right now, and it also "broadens" the testing horizon from strictly Ubuntu Studio.

I'll try to study on how to write a ALSA driver and interact with the Linux kernel programatically first so I can be of more use. Besides previously mentioned link: https://www.kernel.org/doc/html/latest/sound/kernel-api/writing-an-alsa-driver.html Do any of you recommend steps for me to follow so I can be of more help?

jimfrench commented 8 months ago

For now I think most of my use will be for testing.

Testing is extremely important, equally so as coding itself. So you may be a great help. You've already provided info from Windows that would have been very time consuming for me to discover here, we now about your cards' capabilities.

I'd love to also dual-boot a special testing distro as not to corrupt my now working environment. For this it may be good to also install Arch-Linux as it is the distro I use right now, and it also "broadens" the testing horizon from strictly Ubuntu Studio.

Personal preference on distros. If you're in the middle of setting up Arch then makes sense to use that. Personally I prefer to do testing in a distro which doesn't require much time consuming configuration before testing can take place. If you notice the original author, who sadly passed away was using Ubuntu Desktop. I'm currently using XUbuntu and Debian 12. Default distro configurations make it a bit easier to rule out local configuration when sharing test results, i.e., clean system. You can even just live boot and insert the kernel module and do the testing, save the results and then be sure next time the test environment is the same. It's good to have a few options available.

I'll try to study on how to write a ALSA driver and interact with the Linux kernel programatically first so I can be of more use.

The reference you have there is a good one. But for Kernel programming in general, you really ought to learn about he kernel in general first before attempting an ALSA driver. Because the kernel has very specific programming rules and style, which can be found in the same website.

You don't need to be able to write an ALSA driver to be of "use" in helping ... for instance I cannot write an ALSA driver from scratch, or at least not without several weeks of refresher and intensive study. I can follow the code that is being written by @Schroedingers-Cat but in general, they have a far, far greater understanding of the driver and kernel programming than I do therefore I am treading carefully at the moment even in my own fork.

The behaviour of the driver during suspend-resume is complex and the areas which are being suggested to work on are at the very edge of my understanding, the code is complex and solutions may also lay outside of the driver by ALSA failing to pick up the streams etc. It is for these reasons that suspend-resume probably wasn't implemented until now.

I intend to continue studying the driver in depth as mainly an educational / refresher. I am an audio electronics engineer, C programmer, and very familiar with Linux and Linux Audio in general, but I would not call myself a kernel developer. That is a specialist area but interesting and rewarding. I would recommend starting from the ground up with the general information on kernel development and start with a basic driver if you have not written a driver at all before, then build up from that to working with ALSA drivers etc. You can do a lot by comparison with existing drivers in the mainline kernel.

Schroedingers-Cat commented 8 months ago

Happy new year to all of you!

Adding hdspe_write_pll_freq(hdspe); seems to resolve the issue right at the beginning of this thread whereby the system sample rate was undefined, now it returns see enclosed logs.

@jimfrench That's interesting! I didn't commit that line because it never made a difference on my setup. Picking up the previous samplerate was possible exclusively by calling hspe_write_settings() and hdspe_write_control() after making sure the struct contains the correct information after standby.

I am currently disregarding the power management states because they are getting the way of testing the resume behaviour. I'm not this power management should be called at all , research suggests it only exists for historical reasons and that because the device is PCIe, then the D0 and D3hot states should be compliant regardless of model.

So even when setting the D0 state your computer wouldn't reach suspend and only skipping the call would prevent that?

What info did you find that the power management calls are there for historical reasons only?

jimfrench commented 8 months ago

Happy new year @Schroedingers-Cat !

I am a slightly burnt out on this ! But can give a summary of what I've been doing:

There are clearly two separate issues here as you know and they may be interacting with each other.

With the power management D3Hot and D0 states should work on any compliant hardware, of which this card and this motherboard are. Why they are not working here I do not know.

I have compared the driver source with the source of two other sound cards, I have compared the PM states and the other two sound cards in this machine both enter D3hot and D0 without problem.

I don't know if the reason the resume code is struggling is due to the power management states not being met.

The second issue, as you were debugging previously, watching aplay... I have been looking into this in detail. I have been watching the behaviour using the existing and slightly modified debug functions TIME_INTERRUPT_INTERVAL and DEBUG_FRAME_COUNT

There seems to be undefined behaviour when interrupts are stopped during suspend, usually the driver does not stop interrupts at all. I don't understand this enough to debug but have been trying.

The hardware framebuffer seems to fill with audio before the PCM_RESUME and there are undefined status bits between calls. I think, as you say, the hw_params is a good place to be looking but aplay does not call this every time, maybe it needs to be called during resume.

I have compiled a custom kernel with the ALSA and PCI debug flags and will resume work soon. I've been comparing the source with other drivers and feel like we might be missing something simple (?) but my next intention was to debug with PM_SUSPEND to try to resolve the power management issue here first.

There is some work on my fork but it may not work as-is, some of the functions are AES specific but you may be able to change those in the debug functions for the MADI card.

here

You should be able to checkout my fork of your suspend-resume branch with minor edits (status.aes etc.)

I'll get back to you soon with some more details hopefully, thanks for checking back in.

EDIT: hw_params shouldn't be called by the driver itself apparently but do we need to watch the behaviour of the interrupt handler more carefully during the suspend/resume cycle?

jimfrench commented 8 months ago

So even when setting the D0 state your computer wouldn't reach suspend and only skipping the call would prevent that?

Yes even D0 in suspend would lock and only skipping the call would enable debugging the rest of the function. Very odd. I don't think we need card specific PM states ... it should work due to compliance?

What` info did you find that the power management calls are there for historical reasons only?

Ignore that, comparison with other drivers' code shows its very usual implementation.

By the way, my branch is a messy playground ... I know I shouldn't be doing certain things you may see. I intend to scrap that branch and restart from scratch with a custom kernel but got a bit frustrated with it.

EDIT: I've attached the log showing the suspend/resume cycle with the debug routines enabled, notice the behaviour of the interrupt routine and framebuffer between the PCM_TRIGGERs

snd_hdspe_debug.0.log.txt

Schroedingers-Cat commented 8 months ago

@Sotem123 here's a good primer on writing PCI drivers for Linux: https://olegkutkov.me/2021/01/07/writing-a-pci-device-driver-for-linux/

The second issue, as you were debugging previously, watching aplay... I have been looking into this in detail. I have been watching the behaviour using the existing and slightly modified debug functions TIME_INTERRUPT_INTERVAL and DEBUG_FRAME_COUNT

There seems to be undefined behaviour when interrupts are stopped during suspend, usually the driver does not stop interrupts at all. I don't understand this enough to debug but have been trying.

The hardware framebuffer seems to fill with audio before the PCM_RESUME and there are undefined status bits between calls. I think, as you say, the hw_params is a good place to be looking but aplay does not call this every time, maybe it needs to be called during resume.

@jimfrench Interesting! Just took a look at your log from the edited other post and wondered what happens if you add this to the end of the snd_hdspe_suspend function:

    if (hdspe->irq >= 0)
        free_irq(hdspe->irq, (void *) hdspe);

Also, add this to the snd_hdspe_resume function right after snd_hdspe_work_start:

    if (request_irq(hdspe->pci->irq, snd_hdspe_interrupt, IRQF_SHARED, KBUILD_MODNAME, hdspe)) 
        {
        dev_err(card->dev, "unable to use IRQ %d\n", hdspe->pci->irq);
        return -EBUSY;
    }

    dev_dbg(hdspe->card->dev, "use IRQ %d\n", hdspe->pci->irq);

    hdspe->irq = hdspe->pci->irq;
    card->sync_irq = hdspe->irq;

The basic idea is to stop the IRQ on suspend and re-request it on resume. I tried this locally, but it didn't help with the continuation of aplay after resume, but maybe your log output brings up something new?

I didn't see the undefined status bits between the calls in your log. What exactly are you referring to?

When I used your modified interrupt logging (by enabling TIME_INTERRUPT_INTERVAL and applying these changes), dmesg output was really filled up quickly showing only the most recent lines. How did you get the log you provided from such a specific moment?

Ignore that, comparison with other drivers' code shows its very usual implementation.

I'm seeing drivers that use it and drivers that simply don't ... cannot really find out why some get around it ...

FYI, I also tried splitting up the snd_hdspe_hw_params function into reusable pieces to be also called from the resume triggers but it didn't change the behavior of the driver (except from now having all those logs after resume I mentioned here).

jimfrench commented 8 months ago

@Schroedingers-Cat

The basic idea is to stop the IRQ on suspend and re-request it on resume. I tried this locally, but it didn't help with the continuation of aplay after resume, but maybe your log output brings up something new?

Ok thanks, I tried this here and it requests the IRQ again, but general behaviour is the same:

snd_hdspe_debug-with-request-irq.0.log.txt

I didn't see the undefined status bits between the calls in your log. What exactly are you referring to?

I was referring to the BUF_PTR being undefined but now I realise this is not a problem. The interrupt is called when there is no audio and this became obvious when I swapped to generic kernel, rather than low-latency, you can see the latency** (?) manifests as several interrupt calls inbetween when there is no audio in the framebuffer therefore this is probably OK. I think the other status bits remain saved as I printed several of them to make sure in earlier logs. The only exception was the system sample rate not being saved which was resolved by adding the hdspe_write_pll_freq(hdspe);

snd_hdspe_debug-high-latency.0.log.txt

Therefore I think the status is being saved and restored correctly, and we're going around in circles with that hw_params call...

When I used your modified interrupt logging (by enabling TIME_INTERRUPT_INTERVAL and applying these changes), dmesg output was really filled up quickly showing only the most recent lines. How did you get the log you provided from such a specific moment?

I achieved this by simply disabling all PCM streams, made sure ALSA was idle, no pulseaudio, no jack etc. and doing a quick suspend/resume cycle during aplay

It is possible to capture the output like that and I have even noticed that on occasion there are no interrupts for several minutes, or even hours! But usually, they occur all the time.

This is undefined behaviour*, but I can't get it to repeat reliably. Something is causing the audio interrupts when there is no PCM stream from ALSA (silence playback?) and sometimes they stop.

Therefore, if my understanding is correct, the driver doesn't currently "expect" the interrupt routine to be stopped and started again like we're now doing during suspend, and something else needs to happen either before or after which is currently undefined.

However, the interrupts stop and start by themselves seemingly at random without causing the hardware parameters to apparently need to be reset and suchlike, as is happening during suspend. This doesn't make sense and is where my knowledge is getting thin ... I've got a feeling it's something simple that's being missed, but I don't know what.

I'll have a further play soon... hope that helps for the minute.

EDIT: *Undefined behaviour in the original code before suspend-resume code implemented ...

EDIT2: **That could be a difference in higher priority of interrupt handling not difference in latency, again I'm on edge of understanding this here.