alsa-project / snd-firewire-ctl-services

A set of server programs for audio and music units on IEEE 1394 bus supported by Linux sound subsystem a.k.a. ALSA.
GNU General Public License v3.0
35 stars 5 forks source link

FW MOTU - very lagre buffer sizes required to avoid NMI Watchdog detected hard LOCKUP on CPU XX #138

Closed TNAudio closed 1 year ago

TNAudio commented 1 year ago

Part of my post from DIYaudio:

I have been struggling with professional use of CamillaDSP for last 2 months. It's difficult for me to contribute in other way than to provide my feedback with many combinations of hardware.

2 months ago, when i have started playing around with combination of CDSP on a server + firewire interface, the only problems I had were:

Some buffer underruns/CDSP restarts with 96k sampling frequency and 128 sample chunksize
Rare buffer underruns/CDSP restarts with 96k sampling frequency and 256 sample chunksize
Stalling of CDSP with 96k sampling frequency and 64 sample chunksize - recoverable, I could just change chunksize via web gui and CDSP would restart itself with stable parameters

About 2-3 weeks ago something changed, completely new system with same hardware and installation method (audioscience review tutorial):

No buffer underruns/CDSP restarts with 96k sampling frequency and 2048 (!) sample chunksize, at least for 14-hour stability test
Complete freezing of whole system with 96k sampling frequency and 1024 (!) sample chunksize, within minutes of running
Complete freezing of whole system with 96k sampling frequency and 512 sample chunksize or less, right after applying changes

The system freeze means dropped connection via web GUI and SSH as well as console notification: NMI Watchdog detected hard LOCKUP on CPU XX The only thing I can do at this point is to do a hard reset of the machine, or power cycle. Needles to say this is unacceptable from a stability point of view.

Tested Hardware:

PCs:

Haswell-based Supermicro server (Xeon E3-1271 V3, X10 supermicro MB)
Skylake-based Suprmicro server (Xeon E3-1220 V6, Asus P10 MB)

Firewire PCIe cards:

TI-based (...)ZAY chip, 3x FW800 ports
TI-based (...)ZAY chip, 1x FW400 + 2x FW800 ports (recommended by ‘Interfacing Linux’ guy)
VIA – based chip with PCI bridge chip, 3x FW400

Firewire interfaces (all of them made by MOTU):

2 different UltraLite
2 different 828 MK2
2 different 828 MK3
UltraLite MK3 FW
UltraLite MK3 Hybrid
Traveler

A lot of different FW cables, various lenghts and makes

Tested Software:

Firewire drivers:

Takashi Sakamoto’s alsa-firewire, versions 4.17 and 5.19
Kernel-included from 5.14 upwards

Kernels:

almost everyone from 4.10 to 6.06
generic
lowlatency
rt

Distros:

Ubuntu desktop 18.4; 20.4; 20.10
Ubuntu server 18.4; 20.4; 20.10
Debian 10
Debian 11
Debian 12

CDSP versions:

1.0.0; 1.0.1; 1.0.2; 1.0.3
no GUI and GUI 1.0.0; 1.0.1

Every combination of the above gives the same result. Even if I can make it work ‘stable’, the chunksize is so big that latency without FIR filters is significant (above 30-40 ms). Unstability means system lockup, as I have mentioned before. It used to work before last 2-3 weeks. My yet another conclusion is that the problem lies in dependencies, for example updated python packages? The distribution, kernel, hardware, drivers and CDSP doesn’t seem to matter. I have invested a lot of time and funds to make it work – no success now, with a big red flag that package upgrades might ruin this software/hardware combination.

I beg you all for help, I can provide logs/configs after NYE. Even if my problem can’t be resolved – this can be a warning for everyone who wants to try to go ‘pro’ with firewire.

takaswie commented 1 year ago

I use another remote repository for development of in-kernel driver for implementation of IEC 61883-1/6 protocol:

Would I request you to make another issue to the above and close this issue?

Precisely, it's difficult for me to solve something just by your report because of less information about software you attempted. Any log from software runtime is preferable in general as well as experience description, IMHO.

TNAudio commented 1 year ago

Thank you, I will post it in your other repository with logs. I will include them after New Year's Eve, I'll paste the kernel logs, CamillaDSP (https://github.com/HEnquist/camilladsp) developer pointed out that his software works in user space and my errors occur in kernel space - they might be caused by hardware or driver bugs. There is no error in software logs - kernel locks up before it can react.

Thank you for hard work with the drivers, Respectfully, Tom