Sakura-IT / SonnetAmiga

Reimplementation of WarpOS supporting Sonnet Crescendo 7200 and other PowerPC PCI cards (mirror of CVS development repository).
MIT License
39 stars 3 forks source link

Get the sonnet.library working on the 1200TX mediator #21

Closed DvdBoon closed 4 years ago

DvdBoon commented 8 years ago

At the moment, the sonnet.library only supports the A3000Di mediator. This is a mediator which has the needed 3.3V line. The only other mediator which has the 3.3V line (and with a much larger userbase) is the mediator TX (when connected to an ATX PSU).

The current state is that the PPC is initialized by the sonnet library. It sets up its memory and communicates this with the 68K. The 68K however cannot properly address the sonnet memory. This is probably due to the Z2 window.

I suspect there are functions inside the pci.library to fix this. The memory of the sonnet should be initiated the same way as graphics memory (using the pci.library) and the relevant pci.library functions will probably contain MMU code.

I will investigate this further.

DvdBoon commented 8 years ago

Got some first results. Using the default MMU window of $80000000-$A0000000 did the trick. See http://amigafun.blogspot.nl/2015/12/picture-of-day-dualppc.html

Will push the updated code later.

rkujawa commented 8 years ago

Congrats on that, I saw the code, looks good to me. I really need to put back my A1200T together, now that I can test Sonnet with it.

DvdBoon commented 8 years ago

A (number of) function(s) appear(s) to be broken on the 1200TX. The only other program I tried didn't work (Quake2) It exits gracefully complaining about the soft_rend.dll not being available. It loads fine, however. I also see that the main program probes for a certain 68K port. I expect Run68K to be broken. Looking into it.

DvdBoon commented 8 years ago

Investigation into the problem is hindered by debug tools not correctly working in the environment created by the PCI library. Monam crashes and COP cannot read the gfx or sonnet memory (put into place using the MMU by the pci.library).

Next to that some strange stuff is happening. The CyberPI program for example only outputs the pi result. The actual text before and after are not displayed. This is actually 68K code (Output() and Write()) called from the 68K part of the CyberPI program indicating that 68K lib function calling from gfx or sonnet mem is impaired, probably in combination with Z2 window shifting.

I don't think we'll see a working Sonnet card on the A1200 this year.

DvdBoon commented 8 years ago

At the moment writing my own MMU setup code to replace the one in pci.library. Chopping up the 512MB from $80000000-$A0000000 in 8MB pieces. Every 8MB piece will point to the Z2 memory of $200000-$A00000. Only one 8MB will be marked valid, though The rest will be marked invalid. An access to an invalid 8MB piece will invoke the bus error handler. Inside this handler we'll shift the Z2 window and mark the new 8MB piece valid and the old one invalid. Will be using indirect page descriptors for VERY fast MMU switching. Both supervisor and user MMU trees must be addressed.

In theory, this all sounds good, but I don't know yet if this will interfere with the workings of the pci.library. Even with MMU=no as option the pci.library still installs a bus error hook on the A1200, probably to catch PCI memory accesses. I'm optimistic, though :-)

rkujawa commented 8 years ago

I wonder, would it be possible to use mmu.library for this? It is still actively developed and seems to be the standard system-friendly way for manipulating the MMU.

http://aminet.net/package/util/libs/MMULib

rkujawa commented 8 years ago

Also note that you should support 4MB pieces too, after all Mediator 1200 can be switched into 4MB window mode by jumper.

DvdBoon commented 8 years ago

Yes, I am using the mmu.library for this. Maybe pester the author about be able to set multiple indirect pages at once. I now have to do a Setproperties call for every page (4k...and that for 512MB)... Only during setup, though.

The 4MB support will follow if this works for the 8MB window

rkujawa commented 8 years ago

As I understand, @thorfdbg is the author of mmu.library, maybe he can comment here and give his opinion, since he has an account on GitHub :wink: .

DvdBoon commented 8 years ago

Oh, well, I've send him a mail already :-) It's a nice-to-have really as I can just program a loop. It is needed during setup only but it will shorten the setup time. Most setup time is already going in setting up the PPC MMU.

thorfdbg commented 8 years ago

Am 01.02.2016 um 13:06 schrieb Radosław Kujawa:

As I understand, @thorfdbg https://github.com/thorfdbg is the author of |mmu.library|, maybe he can comment here and give his opinion, since he has an account on GitHub :wink: .

It's probably easier just to mail to this account. Not sure whether github will reach me. But yes, thorfdbg is my github account.

Greetings, Thomas

DvdBoon commented 8 years ago

Hello Thomas,

Thank you for your answer. I was under the impression that using the INDIRECT approach was faster than the SINGLE approach. I was also under the impression for SetIndirectArray to work, you have to install the pages first using SetProperties. As INDIRECT only takes one DESCRIPTOR I need to call it for all the pages during setup. There is no magic code inside SetProperties which updates the DESCRIPTOR per page used when size > 1 page.

I think I know enough to first try this all with the INDIRECT approach. I just need to call the SetProperties function 131071 times :-)

Let's see how that goes. Otherwise I'll do the single approach.

I see that the docs don't explain what 'lower' (a1) is for SetPageProperties.

The end goal is to make PCI memory access seem transparent to programs on the Mediator 1200TX. The whole 32bit range can only be accessed using a sliding windows residing at $200000. In effect, I need to replace a similar function now executed within the pci.library but which is buggy and not friendly to other programs using the MMU.

Thanks again,

Dennis.

I am developing a library which needs to redirect memory calls from a 512MB block starting from $80000000 to Zorro 2 memory space. So I want to point multiple blocks of 8MB inside that 512MB block to one 8MB block at $200000.

I want to do this with MAPP_INDIRECT as I need to change the properties of those 8MB blocks quickly. (One of them is valid, the rest will be invalid. If invalid, bus error is handled -> mediator window is set on invalid block -> block is made valid -> Previous 8MB block is made invalid -> rte).

That's one possibility. MAPP_INDIRECT has one drawback, however, and that is that the page modes cannot and will not be automatically adjusted when a DMA transfer is running from such pages. A better alternative is usually to map the pages as MAPP_SINGLE, which creates a single descriptor per page, i.e. it disables the early termination page descriptors (for the 030 and the 581, of course).

Then, you can rather quickly change the modes by SetPageProperties().

There is one drawback, however, namely that your properties will be overwritten as soon as the MMU table in this area is rewritten, so it's usually best to setup the high-level page propertes with SetProperties() to something useful, and then alter the page properties one by one. SetPageProperties is pretty quick and goes directly on the page descriptor, by a per-CPU type dispatcher that avoids CPU-dependencies.

If this is not acceptable, you can also install a page-table access handler with

AddContextHook(MADTAG_TYPE,MMUEH_PAGEACCESS,...)

and you will be called as soon as the high-level function leaks through to the low-level and changes a setting there. That's the strategy MuGuardianAngel works, namely by quickly changing the page access to INVALID to those pages that should not be reached because they only contain free memory.

In case you want to recover from page faults, you should also set MAPP_REPAIRABLE. It tells the bus-error handler to collect additional information for you that is otherwise not available.

Is it correct I can only set 1 page at the time with INDIRECT and MAPTAG_DESCRIPTOR? using SetProperties? I would like it that when I set logical address and size that when I set size at 8MB that it increases MAPTAG_DESCRIPTOR by 4 (or 16, depending on tag or flag?) for every page (so for like 8MB/4k pages) so I don't need to call SetProperties for every page during setup (=8MB/4k calls).

MAPP_INDIRECT works also on larger page sets, though it then acts similar to MAPP_BUNDLED, i.e. the entire page block will go to the same (single) descriptor. The Lib does not try to "smartly" adjust the position of the indirect descriptor.

If you need to adjust an entire array, you have to use...

I want to use SetIndirectArray in the bus error hook.

..exactly that. (-: It's a low-level function, i.e. on the same level as SetPageProperties().

Or is this already possible and did I overlook it in the documents? Or is there another method of quickly just mark pages invalid/valid.

As suggested above, I would go for SetPageProperties() which is usually the safer way of getting it done. However, not knowing your constraints and design goals, it's a bit hard to answer.

2016-02-01 19:55 GMT+01:00 Thomas Richter notifications@github.com:

Am 01.02.2016 um 13:06 schrieb Radosław Kujawa:

As I understand, @thorfdbg https://github.com/thorfdbg is the author of |mmu.library|, maybe he can comment here and give his opinion, since he has an account on GitHub :wink: .

It's probably easier just to mail to this account. Not sure whether github will reach me. But yes, thorfdbg is my github account.

Greetings, Thomas

— Reply to this email directly or view it on GitHub https://github.com/Sakura-IT/SonnetAmiga/issues/21#issuecomment-178128622 .

thorfdbg commented 8 years ago

Am 01.02.2016 um 20:17 schrieb DvdBoon:

Hello Thomas,

Thank you for your answer. I was under the impression that using the INDIRECT approach was faster than the SINGLE approach.

The question is "how fast" is "fast enough". It would probably be helpful to know what you are attempting and what the expected latency is. So for example, how often do you need to adjust pages? SetPageProperties is not overly slow (unlike the high-level functions).

A second question you need to answer yourself is whether the area you are going to remap is ever touched by a DMA transfer. If so, then indirect descriptors will cause trouble.

I was also under the impression for SetIndirectArray to work, you have to install the pages first using SetProperties.

Yes, as always. You need to set the MAPP_INDIRECT flag for that.

As INDIRECT only takes one DESCRIPTOR I need to call it for all the pages during setup. There is no magic code inside SetProperties which updates the DESCRIPTOR per page used when size > 1 page.

It will always point to the same descriptor for all papges in the area, this is correct.

I think I know enough to first try this all with the INDIRECT approach. I just need to call the SetProperties function 131071 times :-)

Not necessarily. You can also first set the pages to point all to the same indirect page and then call SetPagePropertiesA() to adjust the pointer. MAPTAG_DESCRIPTOR is the tag that defines the target descriptor.

Let's see how that goes. Otherwise I'll do the single approach.

I see that the docs don't explain what 'lower' (a1) is for SetPageProperties.

Its the logical address for which the mapping has to be modified.

The end goal is to make PCI memory access seem transparent to programs on the Mediator 1200TX. The whole 32bit range can only be accessed using a sliding windows residing at $200000. In effect, I need to replace a similar function now executed within the pci.library but which is buggy and not friendly to other programs using the MMU.

I see. So in essence, a write into the PCI area (logical address) has to be remapped to go into the window at $200000 instead, and for every potential PCI access you have to perform a remap? However, how do you decide which window of the PCI memory to map in? And, forgive me the silly question, why is the memory not accessed in the window directly with its hardware address, thus why the back-and-forth of physical and logical mapping? After all, a program can only access a single PCI device at a time due to windowing in first place, do I get this right?

If I may: It is probably easier not to use descriptors in first place. The library can give you accesso the CPU pipeline, i.e. you can get accss to the data the code has written, and the data the code has just read. You can define an exception hook that picks up the data that has just been written, its size, and perform the transfer to the target address within the window manually.

The downside is that this does not work for movem, i.e. it can only catch up byte, word or long-word reads or writes.

If nothing helps, I can probably add another flag to SetPageProperties() to have "non-bundled" indirect descriptor setup, though this might take a while until I have an implementation.

Let me know how it goes and whether I can do something for you to support you further.

Greetings, Thomas

DvdBoon commented 8 years ago

Hello,

2016-02-01 21:04 GMT+01:00 Thomas Richter notifications@github.com:

Am 01.02.2016 um 20:17 schrieb DvdBoon:

I see. So in essence, a write into the PCI area (logical address) has to be remapped to go into the window at $200000 instead, and for every

For every access, that is correct.

potential PCI access you have to perform a remap? However, how do you

I need to do a remap if the access is outside the current 8MB window. The Mediator can hardware remap/mirror an 8MB window of PCI memory to $200000 No remap is needed if concurrent accesses are within this window. As soon as there is an access outside this window, I need to remap.

decide which window of the PCI memory to map in? And, forgive me the

By looking at the upper bits of the logical address. These bits are used to move the Mediator hardware remap window to the correct PCI 32 bit address. Then I need to remap the PCI addresses to this window at $200000 using the 68K MMU.

Example:

If I access $80000000. The mediator window need to move to this address so the window at $200000 is reflecting PCI memory at $80000000-$80800000 (it's how the mediator works) Then I remap, using the MMU, access to $80000000-$80800000 to $200000-$a00000. Every access within $80000000-$80800000 is now transparently handled.

Then an access to $90000000 happens. Then I need to configure the mediator to show/mirror $90000000-$90800000 at the $200000 window and tell the MMU to translate these addresses to $200000 All this needs to be as fast as possible. I hope I'm a bit clear :-)

silly question, why is the memory not accessed in the window directly with its hardware address, thus why the back-and-forth of physical and logical mapping? After all, a program can only access a single PCI device at a time due to windowing in first place, do I get this right?

I need to run 68K code from within PCI memory. As Amiga code is mostly small, I don't expect a lot of switching of the MMU and Mediator window, but that's why I need it to be as fast as possible.

If I may: It is probably easier not to use descriptors in first place. The library can give you accesso the CPU pipeline, i.e. you can get accss to the data the code has written, and the data the code has just read. You can define an exception hook that picks up the data that has just been written, its size, and perform the transfer to the target address within the window manually.

The downside is that this does not work for movem, i.e. it can only catch up byte, word or long-word reads or writes.

I need to run code inside the PCI memory. I think the above method is indeed the best if you just shovel data around.

If nothing helps, I can probably add another flag to SetPageProperties() to have "non-bundled" indirect descriptor setup, though this might take a while until I have an implementation.

Let me know how it goes and whether I can do something for you to support you further.

I will, thanks for all your comments so far!

Greetings, Thomas

— Reply to this email directly or view it on GitHub https://github.com/Sakura-IT/SonnetAmiga/issues/21#issuecomment-178164062 .

DvdBoon commented 8 years ago

Using loads and loads of indirect pages using 1 page at the time with SetProperties takes too long, unfortunately (each access to SetProperties becomes slower and slower for every new page added).

I am guessing doing a range using SetProperties is MUCH more efficient. I still like the indirect approach a lot. Maybe just set up with a single indirect page descriptor using the mmu.library and then adjust the table with all the needed indirect page descriptors outside of the mmu.library.

DvdBoon commented 8 years ago

Looking at the documentation of 040/060 it is maybe easier to just do a normal remapped for each 8 MB block. Set them all invalid except for the initial one and just change the 16 pointer level table descriptors which make up the 8MB blocks to mark them either valid or invalid when needed.

I think i see the mmu.library does not (directly) support modification of tables.

Probably won't work on other MMU's

rkujawa commented 8 years ago

If you want my opinion, I think doing anything that would rule out 68030 usage is not very wise. 68030 is still hugely popular with Mediator owners (include me...). Maybe we could have separate methods of solving this problem for 68030 and 68040/68060.

I have to admit I just don't know enough about 68k MMUs to be of any help here. I only dug a bit in NetBSD's 68k MMU code but it is very complex (over 66kB) and in NetBSD every process is living in a separate virtual address space - it's a design completely different than AmigaOS.

Again, maybe @thorfdbg can add the necessary functions to mmu.library, since in one of the above comments he expressed the will to help with this.

I'd prefer to avoid messing with the MMU directly, I'm 100% sure that would cause further problems. Note that some people already have mmu.library installations, anything we do here would create a conflict. Additionally, various CPU-board specific libraries mess with the MMU (68040.library etc.), that's why mmu.library is providing own implementation of CPU libraries...

thorfdbg commented 8 years ago

On 01.02.2016 22:29, DvdBoon wrote:

Using loads and loads of indirect pages using 1 page at the time with SetProperties takes too long, unfortunately (each access to SetProperties becomes slower and slower for every new page added).

Try to setup the properties in inverse order, i.e. start with the highest page. This should give you linear performance instead of quadratic running time.

I am guessing doing a range using SetProperties is MUCH more efficient. I still like the indirect approach a lot. Maybe just set up with a single indirect page descriptor using the mmu.library and then adjust the table with all the needed indirect page descriptors outside of the mmu.library.

I believe I still have to think of a good design for all this. The current design is not exactly suitable for the use case you have. There are a couple of issues, however: First, you cannot really control the page size(s), that's up to the environment to select, and also hardware dependent. Thus, even if I give you access to a higher level in the page table, it is still unclear whether the page size is sufficient for you. Or whether such a descriptor level exists in first place. The Apollo/Vampire core will probably only have a linear (one-level) page addressing model, so there will be no higher level at all.

Second, what happens if a context switch occurs and the tables are exchanged. I need to carry modifications over. In principle possible if you administrate the level yourself.

Third, what happens on DMA transfer. I need to cache-inhibit the boundary pages due to the Amiga bus design, so one way or another the library needs to know where your pages are and how to modify them.

Last, how to handle top-level modifications of the page layout.

I do not have an immediate answer to these questions at this point, and coming up with a good design will probably take a while - even more so as I'm busy with a lot of other things.

But anyhow, thanks for looking into this. I'm confident we'll come up with something, even though it might probably take a bit longer than you might have planned - sorry for that.

Greetings, Thomas

DvdBoon commented 8 years ago

All the help is appreciated :-). In the end, the only thing I really want is that I get notified when an access occurs outside the current Mediator window (being 8MB or 4MB depending on jumper) so I can slide this Mediator window to reflect the correct PCI memory range.

I constrain this to only PCI memory addresses $80000000-$A0000000 to make life easier. 512MB should be enough for now.

You say something about page sizes? I don't want to control them. I just noticed that I can also mark the pointer tables as being invalid. As I see it, there are 128 root tables controlling 32MB each (which is too big) the next level are pointer tables which control 128kb each. So I need to mark 32 (not 16, I miscalculated...) pointer tables as invalid to invalidate all the pages in the next level (at least on the 040. I assume that the 060 is the same. Don't know about 030). Also assuming a page size of 4k, Haven't looked what happens at a size of 8k.

Or am I wrong here?

I'll try the reverse addressing you mentioned first to stay mmu.library compliant.

DvdBoon commented 8 years ago

As an addition: (if the reverse addressing does not work)

So I am thinking about doing 64 (in the case of 8MB) remaps of 8MB each with every one pointing to $200000 and marked as being valid. The level above it (the 32 table pointers for each 8MB window) I'll mark as invalid.

When a hit occurs, I set the correct window (setting the (U)DT bits) on valid and slide the mediator window to the correct PCI memory address. The window we slide away from we mark as invalid.

No need to use INDIRECT. Just need access to the pointer tables.

thorfdbg commented 8 years ago

On 02.02.2016 10:44, DvdBoon wrote:

As an addition: (if the reverse addressing does not work)

So I am thinking about doing 64 (in the case of 8MB) remaps of 8MB each with every one pointing to $200000 and marked as being valid. The level above it (the 32 table pointers for each 8MB window) I'll mark as invalid.

When a hit occurs, I set the correct window (setting the (U)DT bits) on valid and slide the mediator window to the correct PCI memory address. The window we slide away from we mark as invalid.

No need to use INDIRECT. Just need access to the pointer tables.

Which you don't. (-: The problem is really that I cannot even ensure that there are pointer tables in first place. The current abstraction is really a list of pages, not a tree, and it might really happen that the MMU table is not a tree at all. As said, I need to think about it how to abstract this and come back to you.

Greetings, Thomas

DvdBoon commented 8 years ago

For my information, you mean that on the mmu.library level, there is no tree just a list of pages? Ok, I get it now, I think :-) I could set it all up using the mmu.library and then directly poke at the pointer tables using the hardware, but with an outside rebuildtree everything gets lost.

I'll come back to here when I tried the INDIRECT approach with starting at the top of the address range.

DvdBoon commented 8 years ago

Setting the properties in reverse order worked. Only the RebuildTree takes forever now.

thorfdbg commented 8 years ago

Am 02.02.2016 um 11:14 schrieb DvdBoon:

I'll come back to here when I tried the INDIRECT approach with starting at the top of the address range.

As a related question: How large is the window you want to map in? Does it change in size? And how many different configurations would you need to support?

Thanks, Thomas

rkujawa commented 8 years ago

The Mediator 1200 window into PCI memory space is 4MB or 8MB, depending on hardware configuration (a jumper, actually). The size does not change.

If you want I can give you a detailed description how does it work.

DvdBoon commented 8 years ago

I like to map 64 8 MB windows or 128 4 MB windows to 1 static 8 or 4 MB window. If all windows are in place just switching on and off (valid/invalid) would be the fastest, I think.

OR

I like to map 1 8 MB window or 1 4 MB window to 1 static 8 MB or 4 MB window if creating and destroying a single remap on the fly is fast enough. In this case the whole 512 MB is invalid except for the 8 or 4 MB remap which is created on the fly. The previous one is destroyed (made invalid again).

These 8 MB or 4 MB windows are on a 8 or 4 MB boundary starting from $80000000 up to $A0000000

The static window is $200000-$A00000 (Mediator Z2/PCI memory window, can also be $200000-$600000 in case of 4 MB)..

The window size is determined at power up (using a jumper on the mediator) and does not change during operation.

Extra info as to why the 512 MB:

512 MB is wanted as to address 256 MB of Radeon GFX memory and 128 MB Sonnet 7200 memory (or 128 Radeon / 256 Sonnet or 32 Voodoo / 256 Sonnet etc etc). These must be on their own 256 MB and 128 MB etc boundary so actually at least 128 MB is wasted. (small part in use by PCI config registers). This gfx/sonnet memory (~360 MB) has to hold PPC and 68K code.

So for example: $80000000-xxx PCI config space (BARs etc). not much else, mostly within the first 128k $88000000-$90000000 Sonnet memory 128 MB $90000000-$A0000000 Radeon Gfx 256 MB

thorfdbg commented 8 years ago

Am 02.02.2016 um 22:34 schrieb Radosław Kujawa:

The Mediator 1200 window into PCI memory space is 4MB or 8MB, depending on hardware configuration (a jumper, actually). The size does not change.

If you want I can give you a detailed description how does it work.

I believe I understand how it works. I'm just trying to get the operating parameters straight to come up with a useful design for it.

Part of the problem is that you would need to do the switch under uncontrolled conditions. The mmu.library can catch an invalid access just fine, and react on that by calling user level code to perform some action on that. The call into user code from the bus error recovery is completely transparent and does not require any higher level Os magic.

However, most high-level mmulib interface functions use semaphore locking to be thread (or rather "task"-)safe. That is not a problem - the problem is that a task that runs into a semaphore will necessarily break the Forbid() or Disable() state of exec (naturally, what else can it do), and this implies that the task that tried to make the invalid page access, and fixing that by a higher level accessor function would potentially need to be halted because some other task is working on the mmu tables at the very same time. And this "halting" might potentially break a protocol the first task is implementing.

IOWs, the only chance to get the page swap done is by low-level functions, and that's the part which is a bit "touchy" because there is no abstraction of "page level" in the library, and because high-level definiions can overwrite low-level definitions at any time. (And there cannot be a page-level abstraction because it is quite likely that some future extensions or developments will not have a tree-based MMU).

Greetings, Thomas

rkujawa commented 8 years ago

@DvdBoon

I'd highly suggest to use method discussed in this thread to access PCI memory space only. Remember that the window within Z2 memory space (0x20000-0xA00000) is only for accessing the PCI memory space.

To access the PCI configuration space through this MMU mapping, we'd also need to mess with the second Mediator board, the one within Z2 I/O space. This board is internally divided into two 64kB spaces. However, the first 64kB is used to access Mediator bridge setup registers. The second 64kB space is shared between PCI configuration and PCI I/O space. Whether you see config space or I/O space there depends on register at offset 0x7 within the bridge setup space.

Long story short, it would make things even more complex. I'm afraid touching it will create further problems, as only pci.library knows the current state of this area.

On a side note, I really should write an article about low level Mediator programming...

DvdBoon commented 8 years ago

I think I made things confusing with saying config space. I meant special registers set up by the PCI BARs like the configuration block (EUMB) of the Sonnet which is placed in PCI memory (MEMSPACE, ROMSPACE), not config memory (IOSPACE).

Anyway, I don't need to mess with that if the pci.library is running and sets it all up.

DvdBoon commented 8 years ago

Question @rkujawa

0xAddress >> 0x10 & 0xFF80 = value to store at $EA0002 to slide window (byte-swapped) But if I read $EA0002 I see a value 0x2f90. which gives after byte-swapping 0x902f. This value is not 8 MB or 4 MB aligned. pci.library does not do & 0xFF80 (or & 0xFFC0 for 4 MB)? Or do I also need to AND again after reading and byteswapping?

rkujawa commented 8 years ago

0xAddress >> 0x10 & 0xFF80 = value to store at $EA0002 to slide window (byte-swapped)

That's what I did in the Mediator 1200 NetBSD driver and it worked, but...

But if I read $EA0002 I see a value 0x2f90. which gives after byte-swapping 0x902f. This value is not 8 MB or 4 MB aligned. pci.library does not do & 0xFF80 (or & 0xFFC0 for 4 MB)?

To be honest, I reverse engineered pci.library back in 2012 and I don't remember now what does it exactly do. I suspect two possibilities:

Or do I also need to AND again after reading and byteswapping?

Most likely, but you should probably do some experiments with cleanly booted system (i.e. no pci.library running). Write some values to this register, read them back, see what you get.

DvdBoon commented 8 years ago

I hope to have some proof of concept code ready later this evening just to test whether this approach will work. It is a hybrid now of using the mmu.library (detecting mmu type, page-size, setting up the MMU, adding an segfault handler and activating it) and direct poking in the table-pointers to set stuff invalid/valid very quickly (bit 1 of the UDT bits).

Preliminary test are promising so far. Haven't written a complete segfault handler yet, but the set-up is complete.

If the pci.library does not do 4 MB/8 MB alignment I also need to take over the gfx window handling by the pci.library (getting a step closer to eliminating its use altogether almost).

I hope that it will be 100% mmu library compliant in the future. For now, I probably need a hook on RebuildTree to detect tree rebuilds which will reset the UDT bits probably.

thorfdbg commented 8 years ago

Am 03.02.2016 um 13:16 schrieb DvdBoon:

I hope to have some proof of concept code ready later this evening just to test whether this approach will work. It is a hybrid now of using the mmu.library (detecting mmu type, page-size, setting up the MMU, adding an segfault handler and activating it) and direct poking in the table-pointers to set stuff invalid/valid very quickly (bit 1 of the UDT bits).

Please don't do that. It might seem to work, but it will not, and it will cause a lot of problems. The bottom page table layout of a mmu.library generated mmu table is more than just the table descriptors. If you allocate your page table yourself, the additional information the mmu.library keeps at the page level is not there, and hence the library will overwrite innocent memory.

Just to let you know what is there in addition:

*) The page level contains for each page a DMA cache disable counter. This counter is incremented each time a DMA transfer is initiated, and then causes the corresponding page to go into cache-inhibit. (more or less). It is decremented when the page goes out of DMA. If the counter reaches zero, the original cache state is restored.

) The page level *also includes (besides the native MMU descriptor) the abstract descriptor you read by GetPagePropertiesA(). This abstract descriptor also keeps the cache state the page should have according to the library, but which it cannot have due to pending DMA transfers.

*) An additional user-pointer that can be deployed for any type of VM application.

So, in other words: No, you cannot simply replace branches of the MMU table with your own branches and hope this works. It will not.

Please do not attempt this.

How exactly the page table is populated and where this information is stored is an implementation detail. It is not documented, and might change at any time.

As said, I need to come up with a design for your problem, but replacing table pointers does not work.

Greetings, Thomas

DvdBoon commented 8 years ago

In short, as I am on the move: I am not replacing table pointers, I set them up using the mmu library as valid and then directly (using urp, srp) toggle a bit in the udt to mark all, except one window as invalid. But for now I guess I give it a rest :-) Op 3 feb. 2016 14:10 schreef "Thomas Richter" notifications@github.com:

Am 03.02.2016 um 13:16 schrieb DvdBoon:

I hope to have some proof of concept code ready later this evening just to test whether this approach will work. It is a hybrid now of using the mmu.library (detecting mmu type, page-size, setting up the MMU, adding an segfault handler and activating it) and direct poking in the table-pointers to set stuff invalid/valid very quickly (bit 1 of the UDT bits).

Please don't do that. It might seem to work, but it will not, and it will cause a lot of problems. The bottom page table layout of a mmu.library generated mmu table is more than just the table descriptors. If you allocate your page table yourself, the additional information the mmu.library keeps at the page level is not there, and hence the library will overwrite innocent memory.

Just to let you know what is there in addition:

*) The page level contains for each page a DMA cache disable counter. This counter is incremented each time a DMA transfer is initiated, and then causes the corresponding page to go into cache-inhibit. (more or less). It is decremented when the page goes out of DMA. If the counter reaches zero, the original cache state is restored.

) The page level *also includes (besides the native MMU descriptor) the abstract descriptor you read by GetPagePropertiesA(). This abstract descriptor also keeps the cache state the page should have according to the library, but which it cannot have due to pending DMA transfers.

*) An additional user-pointer that can be deployed for any type of VM application.

So, in other words: No, you cannot simply replace branches of the MMU table with your own branches and hope this works. It will not.

Please do not attempt this.

How exactly the page table is populated and where this information is stored is an implementation detail. It is not documented, and might change at any time.

As said, I need to come up with a design for your problem, but replacing table pointers does not work.

Greetings, Thomas

— Reply to this email directly or view it on GitHub https://github.com/Sakura-IT/SonnetAmiga/issues/21#issuecomment-179221430 .

DvdBoon commented 8 years ago

I meant I build the whole tree using mmu library with the PAGES as being valid and toggle bits in the TABLE POINTERS (a level above the pages). But directly IN the tree.

Just to make things clear (I hope).

thorfdbg commented 8 years ago

Am 03.02.2016 um 15:31 schrieb DvdBoon:

I meant I build the whole tree using mmu library with the PAGES as being valid and toggle bits in the TABLE POINTERS (a level above the pages). But directly IN the tree.

Yes, same problem. The page level needs to be build by the library or you're going to be in deep trouble. A "quick and dirty trick"(tm) would be to create not one context, but eight MMU contexts and exchange their upper level pointers between the contexts. This still does not keep the upper and the lower abstraction level in sync, but at least it gives the library a valid MMU table to allow switching.

Allocating parts of the MMU table itself is a bad idea(tm) as I already said.

Anyhow, I don't see the need for any rush at this point, so I would avoid any hacky approach at all. There need to be a solution that is cleanly integrated into the overall design and not a work-around.

Greetings, Thomas

rkujawa commented 8 years ago

Anyhow, I don't see the need for any rush at this point, so I would avoid any hacky approach at all.

Agreed about that, we already have enough hacks and workarounds.

rkujawa commented 8 years ago

Hey @thorfdbg did you think of a clean solution that could be implemented using mmu.library? A month has passed and the discussion has kind of stalled ;).

DvdBoon commented 8 years ago

Had a quick look today and more complications. As the mediator as a default puts the extra memory >$8000000 some of the WarpOS programs fail in general. For example, CyberPi loads an address to be used in a DOS Write() and does a bmi check. This branch will always be taken now and nothing is printed when the string is in an area >$80000000

thorfdbg commented 8 years ago

Hi folks,

no, I haven't forgotten you. I'm unfortunately just very busy these days and have only rarely enough time to work on the Amiga.

So yes, I did some work on the mmu.library and I believe that I have now a design that might work. At this time, it is only implemented, not tested. I will do some elementary tests this and next week, though development may continue at the same glacial speed it did in the past - sorry again for this.

So here is how it will work:

1) Build N additional MMU contexts where each defines the physical to logical mapping for the memory window of the Mediator target window(s).

2) Reserve an additional four byte (one long word) pointer that will keep the active window. Set this to NULL initially. This is a "MMUContext **".

3) In the default MMU context, mark the range into which the windows will appear as MAPP_WINDOW. This will be a MMU property flag in the next version. Additionally, one tag has to be set to the pointer above to identify the window (there can be an arbitary number of windows and this pointer identifies the window).

4) Call "RebuildTree()". This will build a new MMU tree with the "window" area as invalid.

4) When you need to change the memory layout, install one of the N additional contexts with MapWindow(). This call is interrupt and supervisor-callable. So you probably want to install a page access handler in the default context that registers the access and then calls, as reaction MapWindow().

There are a couple of restrictions, though. The additional contexts and the mapping of the default context must "fit" together to make this possible, i.e. the separation of the contexts into "nodes" must be the same, at the same addresses. This is something I probably still have to think about.

I neither know whether "MapWindow" will be fast enough. It is not exactly a long function, but it might be possible to come up with an additional call that pre-computes some of the internal values and stores them in an opaque structure for easier and faster reuse.

DMA transfers, however, will interact fine even when the mapping changes, so the system remains consistent. The system should hence interact nicely with the rest of the MMU system.

I'll probably think about the quirks again, and it will still take a while until I'll have a workable version for testing, and probably a demo, but I'll just want to let you know where I am, approximately, and give you a chance to give some feedback on this if this design does not fit with your application.

Happy Easter!

Greetings, Thomas

DvdBoon commented 8 years ago

Happy Easter Thomas!

Thanks for the update. Speed indeed will be crucial in the end. I am still suspecting that the Mediator 8MB window can be split in 2x4MB with 1 keeping it on the gfx memory, because when I am in COP i can read the addresses (after envi.m) at $80000000 (default window) and at $90000000 (gfx memory) but not let's say $88000000. It would make things a bit easier. (COP is the only mmu program to work when mmu=yes for mediator on my system).

As said, one of the things I thought were the result of pci.library/sonnet.library not working properly was actually an artifact of the code/data being in the negative (if you look at them as signed values) address range (>$80000000) I'm not sure how this affects some of the other WarpOS programs. I have to look further into this. I rather use the, let's say $40000000-$60000000 range.

Focus on my side is to get the virtual signal pool going (PPC and 68K sharing the same signals) before I shift my attention back to the A1200 (I've contacted Sam Jordan for this). So take all the time you need; you're not slower than my speed of coding ;-)

Regards,

Dennis.

thorfdbg commented 8 years ago

Hi David,

short question: Did my update of the mmu.library from last week reach you? I'm not sure whether this email address of yours allows the attachment of binaries.

Greetings, Thomas

DvdBoon commented 8 years ago

Hello,

If by David you mean me, then no. I have not seen any binaries. Which e-mail address did you use?

Regards,

Dennis

DvdBoon commented 8 years ago

Coincedentally, I tried some stuff yesterday with manually moving the window while in interrupt (I am guessing it does it automatically when running nornal code - by another interrupt) and got some strange results like crashing when just letting the program run, while it finishes correctly when a certain amount of delay is inserted... (all without mmu.library btw).

thorfdbg commented 8 years ago

Am 10.05.2016 um 13:57 schrieb DvdBoon:

Hello,

If by David you mean me, then no. I have not seen any binaries. Which e-mail address did you use?

This one - on github. You probably want to send me your private mail (or reply by that) so we don't have to go through github for communication.

Greetings, Thomas

DvdBoon commented 8 years ago

It's dennsvdboon at gmail

thorfdbg commented 8 years ago

Am 10.05.2016 um 14:34 schrieb DvdBoon:

You should have mail now. Let me know whether this worked.

Greetings, Thomas

DvdBoon commented 8 years ago

Sorry Thomas, I made an error. It's dennisvdboon. The 'i' was missing.

thorfdbg commented 8 years ago

Am 10.05.2016 um 15:24 schrieb DvdBoon:

Sorry Thomas, I made an error. It's dennisvdboon. The 'i' was missing.

Ok, third try. (-: Please check.

Greetings, Thomas

DvdBoon commented 8 years ago

Received an update of the mmu.library from Thomas. I'll be trying to get some results on my A1200 this weekend.