Closed DvdBoon closed 4 years ago
The PPC now starts up on the A1200 without the use of the MMU routines from the mediator (ENV:Mediator/MMU = No). It sets up its memory block of 128MB and reports it to the system. The next step is to use Thor's MMU library to shift the Z2 window around within this memory block when needed. I foresee a problem with 68K code from Z2-Window1 trying to access data from Z2-Window2. Both code and data needed by that code need to be located in 1 window. Any ideas welcome.
As we know, AmigaOS exec will by default allocate the memory with highest priority first. Typically, this is the memory on turbo board. Unless we are talking about hunk marked explicitly as Sonnet PPC/Mediator DMA. If we have all hunks in the executable marked correctly, chances of such problems are minimal. But then, if we use some automated method to patch the binary and just add the Sonnet memory extended attribute to every hunj... Of course 68k code might also end up in Sonnet memory.
Second thing... I was under impression that MMU can handle the situation where some service routine has to be executed to fetch both the code and data needed by the running application. If some 68k task is running on top of this virtual 128M address space, why does it care how often are we changing the window position? It only sees the virtual addresses. Maybe I'm missing something here?
Besides, Mediator's memory management routines do seem to work in this exact situation?
Hi folks,
As we know, AmigaOS exec will by default allocate the memory with highest priority first. Typically, this is the memory on turbo board. Unless we are talking about hunk marked explicitly as Sonnet PPC/Mediator DMA. If we have all hunks in the executable marked correctly, chances of such problems are minimal. But then, if we use some automated method to patch the binary and just add the Sonnet memory extended attribute to every hunj... Of course 68k code might also end up in Sonnet memory.
Again, placing memory that is mapped by the MMU is a bad idea, and I would highly recommend not doing so. The problem is DMA. Such memory cannot be reached by DMA savely, even with all MMU library precautions on exactly this matter. The problem is that many DMA devices do not use Os functions correctly to translate logical addresses (as seen by the CPU) to phsyical addresses (as seen by the DMA device).
The problem becomes even worse in case the memory can go away any time, even under the feet of a device performing active DMA.
Second thing... I was under impression that MMU can handle the situation where some service routine has to be executed to fetch both the code and data needed by the running application.
The MMU is not the problem, the processor is. As far as I remember, the 68060 uses a restart model, i.e. in case it detects that an instruction may create a bus error on its execution, it will first trigger the access fault, and when returning, will execute the instruction again. As far as I remember, there is no internal state in the 68060 (unlike earlier models), so it has to re-fetch the instruction again. The cache will, of course, be flushed as part of the window swap.
IOW, in such a case, you end up with a ping-pong of bus-errors, and hence a dead-lock.
The 68030 is much more microcode driven than the 68060, and there it may actually work because the instruction pipeline is stored as part of the exception stackframe, but on the 68060, this is not the case - it's much more simplified and streamlined.
If some 68k task is running on top of this virtual 128M address space, why does it care how often are we changing the window position? It only sees the virtual addresses. Maybe I'm missing something here?
You're probably missing the situation where the data an instruction addresses is in a window different from the one where the instruction itself is.
Besides, Mediator's memory management routines do seem to work in this exact situation?
I highly doubt this. I would rather believe that they simply ignore the problem.
Greetings, Thomas
I'm open for any other ideas. The way it is set up now on the A3000/A4000:
PPC sets up physical memory from 0x0 - 0x10000000 PPC MMU remaps this to logical 0x70000000 - 0x80000000 Any PPC access to logical 0x70000000 is relocated to physical 0x0
Sonnet ATU (address translating unit) maps 0x70000000 PCI memory to Physical 0x0 Sonnet memory. Sonnet memory now is visible in PCI memory starting from 0x70000000. Any access from the 68K to 0x70000000 actually gets through to Sonnet memory 0x0 thanks to the Sonnet ATU. Any access from the PPC to 0x70000000 get remapped to 0x0 thanks to the PPC MMU. So both PPC and 68K access actually 0x0 when addressing 0x70000000.
The Z3 window is from 0x60000000-0x80000000 and it also as a default points to the same addresses. (So 0x70000000 is actually 0x70000000). MMU is not in use by the pci.library (not needed).
The Z2 window is from 0x200000-0xa00000. It is mostly in a state that it points to gfx mem/gfx hardware registers courtesy of the gfx driver. (0x200000 points mostly to either 0x80000000 or 0x90000000; addresses are a little bit different than on Z3). It can move to other PCI addresses if need be. Let's for the sake of argument say that, again, sonnet memory is at 0x70000000. So if we want to access 0x70000000 as in our previous example, we need to shift the Z2 window to this address and access 0x200000 OR access 0x70000000 with the 68K MMU redirecting 0x70000000 to 0x200000. The latter keeps addressing consistent for both processors (pointers etc.). And is what I want to achieve with the mmu.library
WarpOS programs get loaded inside this 0x70000000-0x80000000 area as we can not readily distinguish 68K code from PPC code (except maybe for the first code hunk which is always 68K). So the Z2 windows needs to shift to different spots within 0x70000000-0x80000000 for the 68K to execute code or fetch data if need be.
Indeed, maybe loading data from disk by WarpOS programs would give problems....But it normally would go through 68K DOS/Read(). I don't know how DMA is in this case.
A couple of things:
Data is mostly manipulated by the PPC (for speed, why else have a PPC) and not the 68K. 68K code is kept to a minimum (mostly startup). Mixed 68K/PPC libraries could be a problem.
I'll know after trying :-)
I'm sure the issues Thor is mentioning are valid, but this discussion is in context of Mediator 1200 series only (we don't need to mess with MMU setup on big box Mediators). In case of Mediator 1200 we don't have to worry much about DMA. Nothing besides the CPU will access Mediator 1200. Even if some A1200 turbo boards have DMA-capable devices (like SCSI controllers), they typically only DMA to on-board memory on turbo card (no one ever expected that something will sit between the turbo board and main board, which has only chip RAM). Of course I could imagine some badly written driver might try that, but I think the risk is minimal. It would be a problem if we were talking about A3000/A4000, where DMA can happen to memory located anywhere.
In context of Mediator itself, DMA can happen on the PCI bus only. Since PCI bus has separate address space (shared only through the window with 68k space), it shouldn't bother us. Such DMA-capable driver has to be specially written for the Mediator and would never use 68k addresses (since PCI device can't access 68k side at all).
Btw. On a side note, I'm sure that you know but I wanted to stress that Z2 window address can change (it's not hardcoded to be at 0x200000, but is subject to normal AutoConfig mechanism). Of course when the window is 8MB there's just no more space available so it does have to fit at 0x200000 or it won't work at all.
Just wanted to show (hopefully) how it works on the Z3 Amigas and what the problems are when transposing this on the A1200.
Also, concerning AutoConfig, only the last few updates were regarding correct error handling and giving out error messages ;-) I still need to add window size checking, for example.
Letting some time pass between various attempts to get it working on the A1200 let to the realization that maybe the 8MB window is actually made up from 2x 4MB windows.
And indeed, some testing showed that it actually is the case. I successfully pointed the first 4 MB to 0x98000000-0x983fffff and the second 4 MB window to 0x99000000-0x993ffffff (so 0x200000 pointed to 0x98000000 and 0x600000 to 0x99000000). @rkujawa It was indeed bit 12 which is the selector.
This makes it a lot easier regarding code and data in different windows.
It's certainly possible. I must have never noticed the problem on NetBSD, as if I remember correctly, I mostly tested with 4MB window.
However, I am a bit worried about trying to change window position bypassing pci.library. It might lead to some breakage of Mediator drivers. Or the these drivers might try to change window position when we least expect that.
Drivers change the window by Disable() origwindow=GetZorroWindow() SetZorroWindow(driverspace) Stuff() SetZorroWindow(origwindow) Enable()
The voodoo driver does this using some sort of VBlank interrupt.as far as I can see. Even if the Disable() / Enable() pair are missing I plan on doing a patch on SetZorroWindow to track changes. If SetZorroWindow is called the patch should invalidate all sonnet memory by 68K MMU. Something like that. I'm not there yet in development.
2016-07-20 9:22 GMT+02:00 Radosław Kujawa notifications@github.com:
It's certainly possible. I must have never noticed the problem on NetBSD, as if I remember correctly, I mostly tested with 4MB window.
However, I am a bit worried about trying to change window position bypassing pci.library. It might lead to some breakage of Mediator drivers. Or the these drivers might try to change window position when we least expect that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sakura-IT/SonnetAmiga/issues/21#issuecomment-233863759, or mute the thread https://github.com/notifications/unsubscribe-auth/AKdWEESdNhX370mQpJWf1ZwiqBOAvluuks5qXcy_gaJpZM4GvuJN .
As far as I can see, the changes to the mmu.library do what they are suppose to do. I can set up a memory range, for example 192MB at 0x40000000 for the sonnet and redirect it dynamically to the Mediator Z2 window at 0x200000. So that is good news. With FileX I can look at an arbitrary address range within that window. I can see the PPC debug output and also see the sonnet libbase and functions within this range. So also the library is correctly set up in Sonnet space.
The bad news is that shifting the Mediator window seems to interfere with the Voodoo driver. Gadgets and icons are not updated, for example. Other things like the FileX window contents are updated.
Looking a bit closer at the pci library I noticed that when the MMU is used by the pci library, the drivers only use a 4MB window, leaving the second 4MB window (at 0x600000) probably for the MMU.
Needs some more investigation.
I wonder if in this situation it would be better to use pci.library function for changing window position? At least that should solve the problem with other drivers using pci.library?
I am using the functions of the pci.library at the moment. I'm not sure if I am using them correctly, though, because I lack the documentation. I can have a look at other drivers. And have another look at the bus exception handler of the pci.library.
Want I meant with above post is to let the Mediator think it has a 4MB window and directly manipulate the second window with the sonnet.library. I think that is what happens when the option MMU=yes is set. Also with that option the pci library installs its own bus exception handler and manipulates the MMU directly. We don't want that. Like I said, I am now manipulating the full 8MB window using the pci library functions with MMU=no in ENV/Mediator.
Understood. I thought you're no messing with the window register directly. I hoped that with the new version of pci.library we'd be able to avoid patching it.
I agree with the idea to use 4MB window and try to manipulate the second one with sonnet.library. Using the pci.library's MMU mode is problematic, also agreed about that. Just that when MMU=no, library assumes that it can present the whole 8MB window space to the drivers... Which again is problematic, if we want to steal half of that.
I'm going to revisit this soon. As it stands now, there are two options. One through the mmu.library (with MMU=no) and one through the pci.library (with MMU=yes).
A recent development confirmed some of the above assumptions regarding the pci.library so I want to retry that again. I think that I now know why the earlier attempts were not succesfull.
During the Sonnet Interrupt. I have to move the window myself. Normally this is done by the pci.library when MMU option is enabled but during interrupts the programmer has to do it.
Looking back to the earliest code the sonnet interrupt did move the window but not always when needed. The interrupt calls functions like putmsg, getmsg and replymsg and these functions can address memory ranges which are not within the same Z2 window thus getting the wrong info.
The solution would be to write those functions as part of the interrupt, such that the Z2 window gets moved to the right position (address of the message, address of the msglist, address of predecessor, address of successor and address of msgport for example).
Fingers crossed.
I already noticed that some WarpOS software depend on some output being negative when a failure occured. With the address range above 0x80000000 (and thus negative when signed) some of the programs don't work correctly. CyberPi for example.
wXR at EAB added a bounty for this issue: https://www.bountysource.com/issues/28897993-get-the-sonnet-library-working-on-the-1200tx-mediator
PCI memory seems to be copyback. I need it to be cache inhibited like on the A3000/A4000. Maybe in the future this will change. Also, the address at which the WarpOS programs are running on the A1200 (>2GB range) does not help. I'm trying to get something in place that will bring it <1GB range (the max range the sonnet can install its memory into).
I have a pci.library now which sets pci memory as cache inhibited. I'll try to move all Zorro window conflicting stuff out of the interrupt and into the master control process. (Supervisor versus user mode) and see if that improve things.
The >2 gb addresses issue still needs to be resolved too.
Again, no luck. Programs just halt after a while. Can be immediately, can be after 10 seconds. There is no crash. Both the 68K and the PPC task go to a wait status.
Seems that it is related to the time between putting data in the queue that needs to be transferred to the PPC. When a message pointer is put in a certain register of the mpc107 this pointer is automatically transferred to a circular FIFO on the PPC side and an interrupt is raised. When writing two values into this register close to each other time-wise, the first message gets lost as it is not transferred to the ppc.
I saw a same kind of behavior with the big box mediator while reading from this register. It should give unique pointers but sometimes the ppc side was not updated quickly enough and two consecutive pointers were equal. A simple compare and reread sufficed. Here it is a bit more difficult as I cannot test whether the message was updated correctly while writing. I might have to organize the FIFO manually and just use a simple interrupt when done.
There are more latency problems. Every message that goes through the ports (the 2 PCI registers on the PCI memory side, OFQPR and IFQPR, ) have a chance to be missed on the A1200 Mediator.
Looks like both the reading of free messages to send to PPC and reading of messages send from the PPC (so in both cases the 68K reading from the EUMB registers in PCI space) are affected. Now I also re-read when a duplicate is found from messages send from PPC and Voxelspace has been running for 2 hours now.
Not sure if writing to the registers is affected. I already rewrote the sending of messages to PPC, but releasing used messages is also done by writing to these EUMB registers by the 68K. However, there is a pool of 4K messages so any problems here will show up MUCH later.
Received an updated pci.library where the mediator automatically adjusts the zorro window also in the 0x10000000-0x20000000 range. Strange behaviour of some programs (Quake quitting on memory allocation error, cyberpi not printing any printf text, voxelspace not showing info window) seem to be resolved.
move.l #$89000000,a0 move.l #$8B000000,a1 move.l #$ABCDABCD,(a0) move.l (a0),(a1) rts
does not work with current pci.library version. Version 11.0 and 12.0 worked with above program. Back to Elbox.
Got a fixed library from Elbox and on the A1200 the Sonnet memory is now placed at 0x20000000 and up. This has solved some of the issues like no text output in WarpOS programs due to addresses being negative when >0x80000000.
Simple stuff (the demos and tools from the WarpOS distribution seem to work now on my Amiga with the latest build. So the hanging which happened intermittently was also caused by the previous version of the pci.library.
Voxelspace however seems twice as slow as normal. QuakeWOS (the software 3D version) crashes after loading AHI. There seems to be some memory trashing somewhere.
Fixed the memory error. Now another error popped up while starting QuakeWOS. Some jump to zero page.
If the Amiga OS handles anything which is in Sonnet memory, there is a chance on a crash. For example, the Wait() function inside the library crashes when entered with the 68K stackpointer pointing to Sonnet memory. I think this has something to do with the Supervisor context of the 68K MMU. The Wait() function itself calls Switch() which enters supervisor mode.
Placing a StackSwap() with a normal Amiga memory stack pointer before and after the Wait() fixes this issue, but any switch to Supervisor mode with for example the 68K stack pointing to Sonnet Memory or the task structure itself residing in Sonnet memory can (not always) lead to a crash.
I have to contact Elbox to see if their library uses the Supervisor context MMU (correctly). I know it is fully functional in mmu.library. Maybe consider again to try to implement the mmu.library.
A complete rewrite of the AllocMem() patch is also a posibilty. 68K stack, 68K task structures needs to go to 68K memory, PPC segments, PPC library/device bases need to go to Sonnet memory.
Seems more more and harder to implement, though.
Latest build adds more compatibility. Stack and task structures are now forced to 68K memory. All the WarpOS v4 packages demos now work. FlashMandel works. WarpRace works. Voxelspace increased in speed from 80 to 100FPS at standard 320x240. Context switches are being measured at 400us. (In comparison the A3000 does 125FPS/200us).
QuakeWOS crashes or hangs inside input.device. Same with Quake2 and ADoomWOS. Input device tries to traverse a list and data on it is from Sonnet memory. Somehow the memory is not there and it loads $FFFFFFFF into address registers.
Looking into Supervisor mode and/or an Interrupt being installed by those games. Asking Elbox to fix the Supervisor MMU context also still an option.
QuakeWOS was hanging as data was not loaded correctly. De first 16 bytes of files are skipped if the buffer of Read() is in PPC memory. This has been fixed for now by intercepting the Read() and putting the buffer in FAST RAM and then copy it to PPC memory. Now QuakeWOS runs with sound off.
With sound on it still hangs during sound initialization.
Most apps work now. Audio does not work. Warp3D needs testing.
The assembly version of the library will not support the 1200TX mediator beyond what is implemented now. Refer to https://github.com/Sakura-IT/PowerPCAmiga for possible support in the new C version in the future.
At the moment, the sonnet.library only supports the A3000Di mediator. This is a mediator which has the needed 3.3V line. The only other mediator which has the 3.3V line (and with a much larger userbase) is the mediator TX (when connected to an ATX PSU).
The current state is that the PPC is initialized by the sonnet library. It sets up its memory and communicates this with the 68K. The 68K however cannot properly address the sonnet memory. This is probably due to the Z2 window.
I suspect there are functions inside the pci.library to fix this. The memory of the sonnet should be initiated the same way as graphics memory (using the pci.library) and the relevant pci.library functions will probably contain MMU code.
I will investigate this further.