frno7 / linux

Linux 2.2, 2.6, 3.x, 4.x and 5.x kernels for the PlayStation 2.
Other
84 stars 5 forks source link

Fix occasionally freezing USB OHCI #2

Closed frno7 closed 3 years ago

frno7 commented 5 years ago

Commit 05bd9f2bfc37aebef2ecd805fdac718cf3d580bf is a workaround for a problem whereby USB OHCI interrupts occasionally seem to disappear, which results in a device driver freeze. This bug has been present since at least the 2.6.35.14 kernel from 2010. An alternative but much less efficient workaround is to assert extra interrupts using for example a 1000 Hz timer.

USB OHCI interrupts are asserted on the I/O Processor (IOP), and then forwarded to the R5900 Emotion Engine main processor. This forwarding can be done in at least two different ways: via a mailbox (SMFLAG) register or via remote procedure calls (RPC). Both methods fail similarly, which suggests that the problem is somewhere on the IOP side.

sp193 commented 5 years ago

Does Linux now have its own equivalent to USBD? If not, what USBD module did you use? If you have been using a homebrew module and if it was from before late 2014, please upgrade.

frno7 commented 5 years ago

What is the USBD? At the moment I am using intrelay-direct.irx for IOP interrupt relays. I have tested a few different variants of interrupt relays, all randomly failing in a similar manner. The critical part appears to be rather simple in the intr_usb_handler function:

int intr_usb_handler(void *unused)
{
#ifndef RPC_IRQ_SUPPORT
    _sw(TGE_SBUS_IRQ_USB, SIF_SMFLAG);

    _sw(_lw(0xbf801450) | 2, 0xbf801450);
    _sw(_lw(0xbf801450) & 0xfffffffd, 0xbf801450);
    _lw(0xbf801450);
#else
    sifCmdBufferIrq.data[0] = IRQ_SBUS_USB;
    isceSifSendCmd(SIF_CMD_INTERRUPT, &sifCmdBufferIrq, 64, NULL,
                NULL, 0);
#endif

    return 1;
}

The Linux USB OHCI driver manipulates the HCD registers directly via hcd->regs = ioremap(hcd->rsrc_start, hcd->rsrc_len) with IOP_OHCI_BASE defined to be 0x1f801600, without involving the IOP.

There is also a mysterious iop_set_dma_dpcr2(IOP_DMA_DPCR2_OHCI) in the Linux USB OHCI driver. It would be good to bring some clarity into what it actually does, and the DPCR2 register in general.

USB OHCI support is required for an initial Linux kernel submission, see #1. Unfortunately it brings significant complexity with its dependency on IOP infrastructure, including IOP memory allocations.

sp193 commented 5 years ago

USBD.IRX is the USB HCD Driver for the PlayStation 2. I don't know what PS2Linux uses because I never heard of it having its own USB driver, so I presumed that the official PS2Linux involved USBD.IRX. After all, we had no code to refer to, when working on our own homebrew USBD.IRX.

So does this use the standard Linux OHCI driver? There is also a need to ensure that the hardware errata is properly handled.

DPCR2 is the DMA Primary Control Register 2, which is used to control the DMA channels of the 2nd DMA controller. The PlayStation 2 IOP has 3 DMACs. Each DMA channel has 4 bits, with the highest bit being the enable bit and the lower 3 representing the priority of the channel.

frno7 commented 5 years ago

So does this use the standard Linux OHCI driver?

Yes, the Linux kernel manipulates the HCD registers directly from the EE, as for all other kinds of USB OHCI hardware, and it does not use USBD.IRX. What kind of services does the USBD provide? What does its interface look like?

There is also a need to ensure that the hardware errata is properly handled.

I have been told that there is an erratum about storing unaligned USB frames that are 63 or 64 bytes in length. However, I believe that the Linux USB OHCI buffers are always aligned, so this particular problem should not be a concern. Would you be able to provide other known errata?

Also, the USB OHCI bounce buffer is a grave limitation, as noted in #17.

Thanks for explaining the DPCR2! The DMA channels eventually need to be documented in the Linux kernel source code.

sp193 commented 5 years ago

USBD provides APIs for registering Logical Device Drivers (LDD), managing transfers to USB devices, managing USB devices and linking with a USB LDD Autoloader. For more information, you should refer to the official documentation.

Yes, the difficulty with transferring 63/64-byte frames from an unaligned buffer is the only known errata. You need to ensure that the buffers on the IOP side are aligned.

The available documentation of all the DMA channels is not very complete as the controllers are proprietary. But most of the channels are known. As for the lesser known ones:

There is a large topic on the busses and the SSBUSC service as well: https://assemblergames.com/threads/the-playstation-2-busses-dev9.67961/

frno7 commented 5 years ago

USBD provides APIs for registering Logical Device Drivers (LDD), managing transfers to USB devices, managing USB devices and linking with a USB LDD Autoloader.

Hmm... To proceed with the USBD we need to somehow adapt its services to the Linux kernel USB host driver stack, at a suitable level of abstraction.

Does the USBD transfer data via SIF DMA, thereby avoiding IOP memory constraints?

Ethernet support, as in #19, would simplify development of a Linux kernel USBD adapter.

The question remains, though, why the current interrupt relay method fails. Of course, there may be an underlying problem, perhaps related to the EE manipulating USB OHCI registers directly. It would be interesting to figure out what kind of state the OHCI hardware gets stuck in, and trace the series of register manipulations leading up to that point. The failure is easily reproducible. However, I am not an expert on the details of the USB OHCI.

For more information, you should refer to the official documentation.

Do you have links, code or other sources describing the USBD?

sp193 commented 5 years ago

No, it does not involve the SIF on its own as it is an IOP module. 2MB is more than sufficient, if used properly. The 256KB limit you mentioned before, is a Linux thing.

You can refer to the USBD module source from the homebrew PS2SDK. But only the Sony documentation is most complete.

You might want to establish whether it is a problem with the USB driver itself or if it is a problem with relaying the interrupt. The latter is less related to USB.

frno7 commented 5 years ago

2MB is more than sufficient, if used properly.

USB wireless networking devices want huge buffers to operate efficiently. I assume that additional IOP drivers for hard disks, sound, i.LINK, Ethernet etc. will claim a significant amount of IOP memory for code if not for their buffers. :grin:

You might want to establish whether it is a problem with the USB driver itself or if it is a problem with relaying the interrupt.

The Linux USB OHCI subsystem is used by many millions of computers, so presumably it works well unless there is a problem with the hardware, or in this particular case the interrupt relay.

sp193 commented 5 years ago

Unless you do all your processing on the IOP, you probably would not need large buffers. I know that you could possibly reach higher speeds with larger buffers and some devices, but the PS2 was not a console with a lot of memory. Data is read and kept in RAM, only as needed.

What I have done (and what @rickgaiser did) for the SMAP, was to maintain a ring buffer on the IOP for transfers to the EE.

If you are convinced that the problem is with how interrupts are routed, then you should not implement workarounds that involve changing the way the USB OHCI driver work. You could be just masking the true problem.

If it is not already done and you would like to, it is also possible to route interrupts by getting the IOP to trigger the EE-side SBUS interrupt from the IOP, after setting the MSFLAG/SMFLAG bits. The SBUS SMFLAG (sub to main)/MSFLAG (main to sub) register is set by the remote CPU, but cleared when the local CPU writes. These SIF registers can be used to indicate the interrupt cause.

If Linux will not expose or share the SBUS registers and interrupt with other software, then you are free to use it as desired. This could be an even lighter implementation than using SIFCMD.

frno7 commented 5 years ago

I know that you could possibly reach higher speeds with larger buffers and some devices, but the PS2 was not a console with a lot of memory.

That is true. For the PlayStation 2, I think, the problem is not primarily loss of speed but the fact that many useful Linux USB devices will not work unless their drivers are patched to consume much less memory. As an example, commit 0b3a20caae720fc746cda7420f1a226074f17b56 reduces the (hard-coded) memory requirements for wireless rt2x00 devices. They refuse to operate otherwise, which is less than ideal for users plugging in their favourite devices expecting them work. The mass-storage device driver apparently busy-waits to claim the memory it desires. Of course, attaching several USB devices increases memory pressure.

What I have done (and what @rickgaiser did) for the SMAP, was to maintain a ring buffer on the IOP for transfers to the EE.

That sounds like a reasonable thing to do for PlayStation 2 specific devices. :smile:

If you are convinced that the problem is with how interrupts are routed, then you should not implement workarounds that involve changing the way the USB OHCI driver work. You could be just masking the true problem.

Hopefully the true root cause will be identified, eventually, and a proper fix can be implemented. Meanwhile, the current USB OHCI workaround is provisional and designed for system usability.

If it is not already done and you would like to, it is also possible to route interrupts by getting the IOP to trigger the EE-side SBUS interrupt from the IOP, after setting the MSFLAG/SMFLAG bits. The SBUS SMFLAG (sub to main)/MSFLAG (main to sub) register is set by the remote CPU, but cleared when the local CPU writes. These SIF registers can be used to indicate the interrupt cause.

If Linux will not expose or share the SBUS registers and interrupt with other software, then you are free to use it as desired. This could be an even lighter implementation than using SIFCMD.

Hmm... these two approaches sound exactly like ones already described and tested negative in this issue? Maybe I have misunderstood something? Also, Ethernet, for example, seemingly continues to operate perfectly even when USB fails, suggesting that the problem is not a failure of the relay channel in the case of SIF RPC. I suspect that the OHCI interrupt is lost or perhaps not even asserted on the IOP side, for whatever reason.

sp193 commented 5 years ago

It seems that your buffer allocation allocates IOP memory. Why not allocate memory on the EE (which Linux deals with anyway), and use your own code to copy data from the IOP into it? The real sizes of the buffers you use on the IOP do not have to be revealed to Linux.

frno7 commented 5 years ago

It seems that your buffer allocation allocates IOP memory. Why not allocate memory on the EE (which Linux deals with anyway), and use your own code to copy data from the IOP into it?

That is essentially already implemented. The current Linux USB OHCI driver allocates both IOP and EE memory. When OHCI DMA transfers to IOP memory finish, the EE copies the transferred data from IOP memory to EE memory, and vice versa. This is the bounce buffer part of the driver.

The real sizes of the buffers you use on the IOP do not have to be revealed to Linux.

This seems to require that large USB transactions are split into multiple smaller transactions? That appears to be nontrivial to me, since it seems to require an intermediate USB OHCI subsystem layer to somehow arbitrate between the different buffer sizes. The OHCI DMA registers can only handle small transfers, for example, which implies that the Linux USB OHCI subsystem can no longer write these registers directly. A whole new set of transfer queues need to be managed too?

As an alternative, I have noticed that the PlayStation 2 hardware has a concept of chaining several DMA controllers, especially via the SIF, avoiding IOP and EE processing overhead for common use cases. This is described in the manuals, but it is not entirely clear to me whether the OHCI DMA controller can chain with the SIF, as suggested in #17. This would not only solve the memory problems, it would also be much more efficient.

sp193 commented 5 years ago

Technically, USB transfers are limited to 4KB. This is because we use USB 1.1.

I never heard of a hardware function of the PS2 that allowed the output of one channel to be connected to another. If we could do that, then it would be much easier to achieve full 100Mbit Ethernet performance.

Another things you can check for, is whether the buffers on the EE side are aligned properly and the cache lines are flushed as required. If the wrong data is sent or data is corrupted because of a cache coherency problem, then that could explain the rather uncommon crash. I mention this because the EE's DMA channels require 16-byte alignment and data must be transferred in units of 16. However, the cache lines are 64-bytes long, so DMA addresses should have 64-byte alignment if the cache lines are written back. It is also a better practice to not mix cached and uncached accesses.

frno7 commented 5 years ago

Technically, USB transfers are limited to 4KB. This is because we use USB 1.1.

Perhaps it is somehow possible proceed with this, although using bounce buffers in the Linux USB OHCI driver is already frowned upon, as this approach to handling these buffers appears to be unmaintained. Hmm. Special things need to be done anyway, eventually, especially to achieve anything that is low overhead and high performance.

For an initial Linux kernel submission, as in #1, it is best to keep things as simple as possible. Kernel maintainers are not happy to review large pieces of complex code. :grin:

I never heard of a hardware function of the PS2 that allowed the output of one channel to be connected to another. If we could do that, then it would be much easier to achieve full 100Mbit Ethernet performance.

Indeed. This involves a DMATag concept. Sony EE Overview, version 6.0, section 2.6, SIF: Sub-CPU Interface, page 47, says that: The IOP-DMAC reads the IOP memory address and data size from the tag, and transmits the packet with its tag to the SIF. The EE-DMAC reads the packet from the SIF, interprets the first word as a tag, reads the EE memory address and data size from the tag, and decompresses the data to the specified memory address. These transfer operations are performed by the DMACs to avoid generating unnecessary interrupts of the CPU.

Furthermore, Sony EE User's Manual, version 6.0, chapter 5, DMAC: DMA Controller, page 41, says that: In some of the channels, Chain mode is available. This mode performs processing such as transfer address switching according to the tag in the transfer data. This allows data to be exchanged between two or more processors through the mediation of the main memory, not the CPU.

I was unable to recall the specific example I was looking for, with chained DMA transfers from the IOP to the GS via the SIF and the IPU, or somesuch, but I have the impression that these DMA chaining modes are flexible at enabling fairly long and complex chains, including stall-control. Mastering DMA chaining is obviously crucial to achieving low overhead and high performance data transfers between the various peripherals and processors of the PlayStation 2. :smile:

Another things you can check for, is whether the buffers on the EE side are aligned properly and the cache lines are flushed as required.

DMA is not used on the EE side. IOP memory is copied to/from EE memory using the R5900. This is obviously very inefficient, but appears to work as intended.

If the wrong data is sent or data is corrupted because of a cache coherency problem, then that could explain the rather uncommon crash.

The problem is a freeze rather than a crash. An expected interrupt is occasionally lost, in the order of one interrupt in a million. As explained in the issue description, the freeze vanishes if the Linux OHCI driver is hammered with an additional excessive 1000 Hz timer interrupt.

sp193 commented 5 years ago

You seem to be referring to the DMA tag function of the SIF. The EE's DMAC is connected to various devices, which includes the IOP's DMAC via SIF0, SIF1 and SIF2. SIF2 is also the PS GPU DMA channel. The EE and IOP software communicate via the three SIF DMA channels.

Not all DMA channels support tags and it does not seem like there is a function to connect the output of one to the input of another in software.

Even if you copy data to/from the EE via the IOP RAM window, you still need to write back the cache in my experience. Anyway, all the best to you with finding the root cause. It is starting to sound like some timing problem.

frno7 commented 5 years ago

It is starting to sound like some timing problem.

Yes, a timing issue is a plausible cause. Another one could to be that a series of USB OHCI register writes does not come through as intended, or suchlike.

frno7 commented 5 years ago

You seem to be referring to the DMA tag function of the SIF. The EE's DMAC is connected to various devices, which includes the IOP's DMAC via SIF0, SIF1 and SIF2. SIF2 is also the PS GPU DMA channel. The EE and IOP software communicate via the three SIF DMA channels.

Precisely. The best approach to low overhead and high performance USB OHCI data transfers is a topic of #17.

Even if you copy data to/from the EE via the IOP RAM window, you still need to write back the cache in my experience.

Indeed. I should mention that I am not aware of any Linux USB OHCI data corruption. I have transferred hundreds of gigabytes to verify this.

Anyway, all the best to you with finding the root cause.

Many thanks!

sp193 commented 5 years ago

I do not mean that you will surely get corruption of the payload. Although OHCI meant that most processing will be done by the hardware, the software driver will still generate data for the hardware to use, in the form of commands. So if the wrong data is made available to the hardware, you can get undefined behaviour. But no matter, this might not be the problem after all.

Have you taken a look at the source code for the homebrew USBD copy? In 2014, I attempted to fix some of its problems by comparing various parts against the late Sony version. The interrupt handler already clears MIE and MIE is enabled during initialization, but I also added code that toggles MIE off and back on. It is not entirely impossible that the hardware has a problem, which Sony worked around. But we cannot tell because they do not document all problems with the hardware and we do not have access to all their SDK revisions (particularly those that were released in response to this hardware bug). Link to commit: https://github.com/ps2dev/ps2sdk/commit/6aa81f738aabfd478f5ed34596a2127c6ea479be

frno7 commented 5 years ago

Have you taken a look at the source code for the homebrew USBD copy?

I have taken a brief look at the USBD. Admittedly, I am not very familiar with HCD details in general.

It is not entirely impossible that the hardware has a problem, which Sony worked around.

Unexpected fiddling with the MIE is a strong indication of hardware problems, in my view. This workaround was apparently ported to the Linux HCD many years ago. The precise details of the problem seem to be lost.

Link to commit: ps2dev/ps2sdk@6aa81f7

Thanks, that was very helpful!

Perhaps this MIE workaround is the best we can hope for, and so we should conclude this issue? It does seem to work well in practice.

sp193 commented 5 years ago

The fix, if it was one, was made by Sony. Which is why there is no explanation for what it does.

If you are going to add these fixes, you might as well do it completely. You should add all their fixes, even though we do not know why they were done.

The inline assembly code loads a word from the boot ROM segment. Although the IOP is 32-bit, I wrote a 64-bit sign-extended address to work around a bug in our IOP assembler.

frno7 commented 5 years ago

If you are going to add these fixes, you might as well do it completely.

Are you aware of additional fixes for the USB OHCI, apart from the MIE fix and the frame alignment fix already mentioned?

You should add all their fixes, even though we do not know why they were done.

I will attempt to explain the MIE fixes below. We can double-check this with Linux kernel developers later.

The inline assembly code loads a word from the boot ROM segment. Although the IOP is 32-bit, I wrote a 64-bit sign-extended address to work around a bug in our IOP assembler.

I take it you refer to this piece of code in ps2dev/ps2sdk@6aa81f7:

    memPool.ohciRegs->HcInterruptDisable = OHCI_INT_MIE;
    asm volatile("lw $zero, 0xffffffffbfc00000\n");
    memPool.ohciRegs->HcInterruptEnable = OHCI_INT_MIE;

I suspect that the LW instruction above acts as a crude barrier, perhaps for ordering.

The corresponding Linux HCD fix is simply:

    ohci_writel(ohci, OHCI_INTR_MIE, &regs->intrdisable);

The MIE will apparently be enabled at a later point, which is probably why it is not toggled here. Also, I suppose that ohci_writel is implemented in such a way that it does not need additional barriers.

sp193 commented 5 years ago

The problem is that we cannot tell why Sony did some things. Personally, I believe they found some hardware flaw with the IOP (which the USB OHCI controller is part of), hence it is necessary to make an extra word load between the toggling of the MIE interrupt mask. But because Linux controls the USB OHCI controller from the EE, it may or may not be applicable here.

For example, the SMAP's EMAC3 is documented within the PS2 Linux source code to require double-reads, for reading of the EMAC3 registers to be correct. Such information was not available in other generic manuals.

I guess, the bug might be the MIE interrupt causing further interrupts to fail to be signaled, much like the EE INTC's GS HSINT and VSINT causes.

Regardless, I still think you should confirm that the MIE interrupt mask is actually toggled whenever the interrupt needs to be re-enabled. Just so the patch will always work.

frno7 commented 4 years ago

I think we can conclude that the reported problem of USB freezes is fixed with the MIE workaround in commit bedacfd1b9c5461cafe38029657eea7e8eda74b1. It works in practice, and has been tested for several years, but it remains unsatisfactory to not precisely know the root cause of the workaround.

The other topic discussed here that remains problematic for USB devices is driver memory consumption, and the total 256 KiB memory limit, but issue #17 is more relevant for that. It’s nontrivial to manually rework hundreds of USB drivers (for example wireless drivers etc.) to reduce their memory usage, although it can be done in specific cases such as the rt2x00 driver in commit 844ad983a20ea374589ffc033933c0dedd65bc70.

Does anyone want to add anything to the discussion so far? If not, then I suggest proceeding with #17!

frno7 commented 4 years ago

I take it you refer to this piece of code in ps2dev/ps2sdk@6aa81f7:

    memPool.ohciRegs->HcInterruptDisable = OHCI_INT_MIE;
    asm volatile("lw $zero, 0xffffffffbfc00000\n");
    memPool.ohciRegs->HcInterruptEnable = OHCI_INT_MIE;

I suspect that the LW instruction above acts as a crude barrier, perhaps for ordering.

The problem is that we cannot tell why Sony did some things. Personally, I believe they found some hardware flaw with the IOP (which the USB OHCI controller is part of), hence it is necessary to make an extra word load between the toggling of the MIE interrupt mask. But because Linux controls the USB OHCI controller from the EE, it may or may not be applicable here.

@sp193, the LW instruction very much looks like a standard implementation of wbflush via __fast_iob. This isn’t a hardware flaw but rather a standard MIPS procedure defined to halt until all pending writes have completed by reading uncached memory. This flushes the write FIFO of any MIPS implementation to date.

NB. Some MIPS implementations also do partial write gathering and can have reads overtake writes, but I don’t think that’s the case with the IOP. Edit: Assuming the IOP is similar to the MIPS R3051, sources appear inconclusive on partial write gathering: the book See MIPS Run seems to suggest it doesn’t (p. 97) but the IDT R3051 manual says it does (p. 40).

sp193 commented 4 years ago

Ah okay. Thanks for sharing.