More "real CPU friendly" component API bus?

MightyPirates / OpenComputers

Home of the OpenComputers mod for Minecraft.

https://oc.cil.li

Other

1.6k stars 434 forks source link

More "real CPU friendly" component API bus? #1730

Closed iamgreaser closed 3 years ago

iamgreaser commented 8 years ago

Friendly reminder that the spec here is out of date - see this URL for up-to-date spec (namely main.md): https://github.com/iamgreaser/hardbus-oc

LAST UPDATE: 2016-04-05 22:50 UTC

Dropping this here as I'm not sure who lurks on the forums and those who would actually understand what I'm asking for would already be over here. Still lots of room for discussion.

Because of the current date in a lot of places around the world, I have to also point out that this is a serious request.

I don't remember what the OC ARM project uses to communicate with the bus. OCMIPS does a call that looks a bit like this:

*(volatile const char **)0xBFF00280 = cmd_fill;
*(volatile int32_t *)0xBFF00300 = 1; *(volatile int32_t *)0xBFF00304 = 6;
*(volatile int32_t *)0xBFF00308 = 1; *(volatile int32_t *)0xBFF0030C = 6;
*(volatile int32_t *)0xBFF00310 = gpu_w; *(volatile int32_t *)0xBFF00314 = 6;
*(volatile int32_t *)0xBFF00318 = gpu_h; *(volatile int32_t *)0xBFF0031C = 6;
*(volatile const char **)0xBFF00320 = ascii_set_bg; *(volatile int32_t *)0xBFF00324 = 4;
*(volatile uint8_t *)0xBFF00286 = 5;

Translated it's more like this:

ocbus_cmd = "fill";
ocbus_arg[0].val = 1; ocbus_arg[0].typ = OCTYP_INT;
ocbus_arg[1].val = 1; ocbus_arg[1].typ = OCTYP_INT;
ocbus_arg[2].val = gpu_w; ocbus_arg[2].typ = OCTYP_INT;
ocbus_arg[3].val = gpu_h; ocbus_arg[3].typ = OCTYP_INT;
ocbus_arg[4].val = " "; ocbus_arg[4].typ = OCTYP_STR;
ocbus_strobe_call_function = 5;

Not pictured: the code required to find the gpu, starring strncmp and memcpy.

Basically it's a pain in the arse to work with, from both the MIPS end and the Java end, and it gets even worse when one of the things that get returned code is a floating point number and you don't have floating point support available because, for a totally made-up example, you're writing code for the Linux kernel. (Project shelved for the time being, but that screenshot does make people think you're pulling their leg seeing as it's April Fools Day somewhere in the world. And yes, it still builds.)

It would be a lot more sensible to do something like this:

ocdev_gpu->dx = 1;
ocdev_gpu->dy = 1;
ocdev_gpu->rw = gpu_w;
ocdev_gpu->rh = gpu_h;
ocdev_gpu->chr = 0x0020; // space
ocdev_gpu->strobe_imm = OCGPU_FILL;

These are of course volatile, uncached hardware registers.

Which is fine for immediate, non-DMA data. A "set" command, on the other hand, does need DMA for maximum efficiency, but a non-DMA "setch" command would work just fine for architectures that don't have DMA at the time, especially if it autoincrements dx.

DMA facilities should be provided by the architecture, not by the hardware. There are notably fewer architecture types than component types.

Draft Proposal

This is based on how the MIPS data bus typically works, because it's reasonably simple and very flexible. Or at least how I think it works. This is also reminding me that I really need to refactor OCMIPS.

If there's something that sucks about it, please leave it in the comments and I can change it.

I'm pretty sure the ARM data bus works in a similar fashion. Probably the same for the 68000.

MMIO bus

Components shall provide these two functions to architectures (names are preliminary and subject to change):

void mmioWriteMask32(int addr, int mask, int data);
int   mmioReadMask32(int addr, int mask);

It is REQUIRED that every byte of mask provided by the architecture is either 0x00 or 0xFF. Real hardware would use a byte enable.

It is NOT REQUIRED that the lower 2 bits of addr are 00. However, they can be ignored. The mask can also be ignored. Architecture authors MUST pass the appropriate address to addr and mask to mask, even if it's unaligned.

This may get chucked out like that section that was marked as "this may get chucked out"

Architectures should have access to a utilities API to convert addresses to masks:

int getMask8(int addr);
int getMask16(int addr);
int getMask32(int addr);
int getHiMask8(int addr);
int getHiMask16(int addr);
int getHiMask32(int addr);
int getData8(int addr, int data);
int getData16(int addr, int data);
int getData32(int addr, int data);

These are completely optional and to be blunt very easy to implement yourself:

return 0xFF<<((addr&3)*8);
return 0xFFFF<<((addr&3)*8);
return 0xFFFFFFFF<<((addr&3)*8);
return 0x000000FF>>>((4-(addr&3))*8);
return 0x0000FFFF>>>((4-(addr&3))*8);
return 0xFFFFFFFF>>>((4-(addr&3))*8);
return data<<((addr&3)*8);
return data<<((addr&3)*8);
return data<<((addr&3)*8);

And of course getHiMask8 is completely useless, but provided for completeness.

DMA

Components shall provide these functions to architectures:

int dmaChannelCount();
void dmaSetChannel(int cmp_chn, int arc_chn);
void dmaAlert(int cmp_chn/dst_chn, int arc_chn/src_chn);
boolean dmaWrite(int cmp_chn/dst_chn, int data, int size);

Architectures shall provide these functions to components:

void dmaAlert(int arc_chn/dst_chn, int cmp_chn/src_chn);
boolean dmaWrite(int arc_chn/dst_chn, int data, int size);

dmaChannelCount returns the number of channels this component has. This function may be unnecessary.

For dmaSetChannel, an arc_chn of -1 disables DMA for that channel.

Either end can call dmaAlert when they are ready to receive data.

For dmaWrite, size is either 8, 16, or 32. Returns true if the data was accepted. The recipient can ignore the size, but the sender MUST provide it.

It is good practice to keep writing to dmaWrite until it returns false.

If DMA is not supported by a component, this will be an acceptable implementation:

public int dmaChannelCount() { return 0; }
public void dmaSetChannel(int cmp_chn, int arc_chn) { }
public void dmaAlert(int cmp_chn, int arc_chn) { }
public boolean dmaWrite(int cmp_chn, int data, int size) { return false; }

If DMA is not supported by an architecture, this will be an acceptable implementation:

public void dmaAlert(int arc_chn, int cmp_chn) { }
public boolean dmaWrite(int arc_chn, int data, int size) { return false; }

A component MUST NOT attempt DMA on an architecture channel that was not granted to it.

With that said, an architecture SHOULD handle such a case without catching fire. Ignoring it is the best outcome. Unless your aim is to make an easily exploitable system.

Interrupts

Components shall provide these two functions to architectures:

int interruptPinCount();
void interruptSetToken(int pin, int token);

Architectures shall provide this function to components:

void interrupt(int token);

Interrupts from a component MUST only be fired from a valid token.

For interruptSetToken, pin refers to a pin on the component, not on the architecture. If token is -1, this interrupt is disabled.

Sidenote: If you are implementing a Z80 or 8080, the data that gets chucked on the data bus should be determined in the architecture implementation, not the component implementation.

Component headers for Plug 'n' Play bus

NOTE: This may actually be dropped from the spec and implemented in an architecture-dependent way.

Each component has a 256-byte structure as follows:

00 30 = zero-padded component address string (0x30 bytes) 30 0C = \ RESERVED 3C 04 = CRC-32 of address 40 30 = zero-padded component type string (0x30 bytes) 70 0C = \ RESERVED 7C 04 = CRC-32 of type 80 04 = HW API revision for this component (bump every time it changes, please!) 84 04 = MMIO address space size shift amount (size in bytes = 1<<[0x80]) 88 04 = DMA channel count 8C 04 = IRQ pin count 90 70 = \ RESERVED **

CRC-32 is as per zlib (0xEDB88320 right-shift Galois LFSR) and does not include the padding in its calculation.

OpenComputers should handle the header, all a component needs to do is provide the relevant fields - this does not include the CRC-32s, which will be handled by OC itself.

A component MUST NOT provide an address or type with NUL bytes ("\x00" / 0x00) within it.

Endianness

The bus will be in little-endian.

If this is to be used by a big-endian architecture, it is up to the architecture to decide how this will even work. However, it will most likely require translating the mask field, and possibly the data field as well if the data value is not repeated over the bus.

Ideally the address SHOULD match the address fed to the component documentation. If this is the case, then the mask (and possibly the data) MUST be adjusted to suit.

ds84182 commented 8 years ago

So here is how OpenArms does it:

struct value_s;

struct type_none {};
struct type_array {
    int length;
    struct value_s** values;
};
struct type_string {
    int length;
    const char *chars;
};
struct type_float {
    float number;
};
struct type_integer {
    int integer;
};

struct value_s {
    int type;
    union {
        struct type_none none;
        struct type_array array;
        struct type_string string;
        struct type_float number;
        struct type_integer integer;
    };
};

typedef struct value_s value;

All values to and from OC use this structure (signals and components). For example, code to ~~find and~~ use a GPU is:

static void GPU_Set(char* address, int x, int y, char* str) {
    value arguments;
    arguments.type = TYPE_ARRAY;
    arguments.array.length = 3;

    value vx;
    vx.type = TYPE_INTEGER;
    vx.integer.integer = x;

    value vy;
    vy.type = TYPE_INTEGER;
    vt.integer.integer = y;

    value string;
    string.type = TYPE_STRING;
    string.string.chars = str;
    string.string.length = strlen(str);

    value* valuelist[3] = {&vx,&vy,&string};

    arguments.array.values = valuelist;

    // Inlines a call to the OC coprocessor which handles unboxing on the Java side
    // Requests no output (NULL buffer with a size of 0)
    COM_Invoke(address, "set", &arguments, NULL, 0);
}

All these structures are located in main memory, and privilege mode checks are done by the coprocessor. OpenARMs used to use a MMIO interface combined with SVC calls, but it was quite hard to balance when making an OS with privileged execution modes. It's not the most efficient, but it gets the job done :sweat_smile:

Anyways...

This is a great idea because OpenGX uses a FIFO queue system to submit GPU commands, so this could allow for super efficient writing to its FIFO buffer on low level architectures.

fnuecke commented 8 years ago

Just to make sure, in the Interrupts section, the first sentence should be, as in all others,

Components shall provide these two functions to the CPU:

correct?

I absolutely agree that the current system is supremely painful for real CPU architectures. It was from the ground up shaped to accommodate Lua (and other scripting / high level languages), because that was (and for now is) the main use case, after all. So yes, I think that an API like this would be a huge step forward for low-level architectures, and I'd be glad to support this. One fear I do have, however, is that this will lead to somewhat of a split in the ecosystem, where some components support this low-level API and some don't, and it'll be hard to know which do without clear documentation. This may be less terrible as I fear since ideally at least all components in OC itself would support it, but still.

That said, I'm still definitely in favor of this being done, I see it as an overall (big) plus. I will however say that I'm quite rusty on the low level stuff. So at least some of this being done as a PR would be a massive help, and avoid unnecessary derping around. Now, speaking from the Java/OC perspective, I imagine there'll be (at least) two new interfaces, for components and CPUs (architectures in OC lingo)? E.g. DMA might be its own interface, and it not being present meaning there's no support?

One thing I'm unclear on is how mapping of components to an actual memory address would happen, and if that could be generalized or if that would always be up to the CPU. Say there are two GPUs, would the CPU just map the first one to 0xWHATEVER or if this is software controlled, how would the software reflect on the component types? Would the CPU expose a "virtual" component that allows querying hardware information (and could that be provided by OC e.g.)? (I guess I could just read up on how this happens in real hardware, anyway, but I'm lazy, so enlighten me please! :P)

Final note, just from this comment, the OpenARM approach to component interaction looks very flexible. Would it be sensible to integrate something like that as a "fallback" for components that do not support the low-level API? That would do away with my abovementioned worry of some components working and some not.

Anyway, I'll give this a few days to allow other potentially interested parties to comment and to let it sink in, then let me know how I can help with this. I'm in full support of the idea :)

iamgreaser commented 8 years ago

Components shall provide these two functions to components

DAMMIT HOW DID I END UP WITH THAT. Yes, you're right. Fixed.

Interfaces: It could be better to provide them as interfaces that can then be checked with the oh-so-wonderfully-typesafe instanceof keyword. Not sure if even a dummy MMIO interface should be compulsory, though - I'm mostly thinking for speed reasons, but to be honest I don't know how often it will be accessed; one thing that is almost for certain, you're probably not going to use it as often as main system RAM.

At this stage I'm not quite sure how to deal to address errors (HardbusAddressException, perhaps?) - this might be a bit too architecture-specific, so it might have to be omitted. It should be fine for both MMIO functions to throw a LimitReachedException, however - in which case, the CPU has to redo the read or write in the appropriate mode.

DMA and interrupts are going to be pretty much useless without the MMIO interface. In fact, they are extensions to it, and are recommended where appropriate, but you are really only implementing up to two interfaces.

Hardbus vs softbus: Or the low-level (this one) and high-level (current one) interfaces respectively. Softbus support should be provided as a fallback, and current "real" CPU implementations actually do provide such an interface.

Mapping into an architecture's address space: This is entirely up to the architecture. The PnP-header proposal which is yet to be written should hopefully make it easier to deal with. I'm thinking of having not only the address + type strings in there as-is, but also having a CRC32 of each for a much faster lookup.

It may end up being CPU-specific anyway, but these things should be provided to the CPU somehow:

Component UUID/address
Component type
MMIO bus width
Number of DMA channels
Number of interrupts

Not sure what else.

For reference, here's how a lot of actual computers actually do it: http://wiki.osdev.org/PCI

Memory mapping: You have these "base address registers" which map the component into memory space or I/O space. Unfortunately to find the width of the device's address space you have to basically set it to 0xFFFFFFFF and read back, then restore the damn pointer. I intend to make it so that the header provides the length, and the CPU provides an interface to address the given component.
Device lookup: Each device has a manufacturer ID, a device ID, and a device class. If you want to you can look things up by class.

I won't go into details of how ISA PnP works, because as far as I care it doesn't. (I tried coding for it once: while it's "clever", it's also bloody awful to actually use. Basically, delaying stuff in software for ever-changing hardware is an invention of Satan.)

EDIT: I should probably add, I'm drafting up a possible hardbus interface for the OC GPU. Poke me on IRC if you want the latest draft. Until then, I'll just keep chipping away at it.

iamgreaser commented 8 years ago

I've set up a git repo so this proposal can be extended and potential component APIs can be drafted: https://github.com/iamgreaser/hardbus-oc

Let me know on #oc or here or there if you want write access. If you know what a pointer is and I can trust you to not shit all over the repo with terrible ideas you should apply.

fnuecke commented 8 years ago

Thanks for the write-up. Also good idea to make an extra repo for this, given the scope. Will browse through it in the evening!

MaHuJa commented 8 years ago

I'm replying here because I have a more general comment. I agree about the clunkiness of the current method and the need for a better interface. However, I believe the approach in the OP is going to fail.

Quite frankly, looking at this proposal and thinking about asking a component maker to implement it makes me cringe.

Imagine I make or maintain a mod which provides components. I'm probably only using lua archs myself. If you're ever going to have this native-like access to my component, you'll need to sell me on it.

Why would I ever want to supply less than math.huge dma channels? What should I as a component maker set it to and why? Why should I have to care? Why should I care about the hardbushandle stuff, and why is it not honestly called a lock? And why the hell is that lock my responsibility?

Why should I have to do anything beyond, say...

//component side
int memsize();  // size of memory block exposed to mmio
bytearray memread(int address, int size);
void memwrite (int address, bytearray data);

With every change you do from this, you have to answer a couple questions. Why is that not sufficient, and why do I have to deal with it?

Unless those questions have good answers, it is better to do nothing, than block better solutions and discourage component compliance.

I may need to call interrupts where I'm already sending events. We could perhaps add "int block" or similar to allow some reads/writes (think dma) to happen "elsewhere", whereas the default mmio is always block 0. Then it's the responsibility of the arch to deal with everything DMA, including dma related interrupts. I don't want to care, and why should I have to? If it's the same code in 100 classes, it probably belongs elsewhere.

Another option is that OC or architecture makes provisions for such conversion being provided by a third party mod when a component does not provide it by itself. That limits your required buyers to those interested in your arch(s) to begin with. If such an adaptation can (optionally) be written as a lua script in the config directory, you'll have more potential buyers, but that'll take some extra effort. Even if you treat them as prototypes to later be implemented "for real" in a jvm language when they've been shown to work well.)

Component makers could then either contribute them, keeping them separate from their mod, or we could redirect to them when they implement it themselves, or fall back on a converter anyone can supply.

If the results of that work is then usable by all the real-arch mods, all the better. [mod_name] could be a common dependency mod for all the 'real arch' addons, and contain not just the jvm code to make this happen, but also a library of adapters with consistent semantics. Even better consistency than you can expect if the component devs make their interfaces themselves.

If you intend to proceed with the proposal in the OP, I would recommend starting a new document, setting up an example of a simple component suitable for a plain but full mmio interface, how its use flows across the different functions you're asking them to add, and how to implement it. Then continue with when IRQs are useful and how to do it, when DMA is useful, and how to use it.

This should then make it crystal clear what you're asking of every component dev ever. Even those that do not care about the internal life of misc archs. If done well, it'll be the only guide a peridev ever needs. I suspect it will also reveal some unnecessary complexity you're moving all the way to the component, that doesn't have to be.

It feels like you had this realization wrt PnP. I'm making the argument that this extends far beyond just the PnP section. You may be too arch-focused, and in your effort to keep your garden clean and orderly (simple) you throw the shit over to your neighbours, the component devs. And then ask them for favours. Even if you're indirect through common users, you're basically saying "please implement this".

ds84182 commented 8 years ago

What about an arch that doesn't want to do MMIO? I'm all for the DMA approach because it uses host memory for that, so something like A FIFO for a graphics pipeline would be possible (which is what I do in OpenGX), but with MMIO I would have to go back to the original "write gather pipe" approach (and at that point what's the incentive over using the components API directly?)

On Wed, May 4, 2016, 8:12 AM MaHuJa notifications@github.com wrote:

I'm replying here because I have a more general comment. I agree about the clunkiness of the current method and the need for a better interface. However, I believe the approach in the OP is going to fail.

Quite frankly, looking at this proposal and thinking about asking a

component maker to implement it makes me cringe.

Imagine I make or maintain a mod which provides components. I'm probably only using lua archs myself. If you're ever going to have this native-like access to my component, you'll need to sell me on it.

Why would I ever want to supply less than math.huge dma channels? What should I as a component maker set it to and why? Why should I have to care? Why should I care about the hardbushandle stuff, and why is it not honestly called a lock? And why the hell is that lock my responsibility?

Why should I have to do anything beyond, say...

//component side int memsize(); // size of memory block exposed to mmio bytearray memread(int address, int size); void memwrite (int address, bytearray data);

With every change you do from this, you have to answer a couple questions. Why is that not sufficient, and why do I have to deal with it?

Unless those questions have good answers, it is better to do nothing, than block better solutions and discourage component compliance.

I may need to call interrupts where I'm already sending events. We could perhaps add "int block" or similar to allow some reads/writes (think dma) to happen "elsewhere", whereas the default mmio is always block

Then it's the responsibility of the arch to deal with everything DMA, including dma related interrupts. I don't want to care, and why should I have to? If it's the same code in 100 classes, it probably belongs

elsewhere.

It feels like you had this realization wrt PnP. I'm making the argument that this extends far beyond just the PnP section. You may be too arch-focused, and in your effort to keep your garden clean and orderly (simple) you throw the shit over to your neighbours, the component devs. And then ask them for favours. Even if you're indirect through common

users, you're basically saying "please implement this".

Another option is that OC or architecture makes provisions for such conversion being provided by a third party mod when a component does not provide it by itself. That limits your required buyers to those interested in your arch(s) to begin with. If such an adaptation can (optionally) be written as a lua script in the config directory, you'll have more potential buyers, but that'll take some extra effort. Even if you treat them as prototypes to later be implemented "for real" in a jvm language when they've been shown to work well.)

Component makers could then either contribute them, keeping them separate from their mod, or we could redirect to them when they implement it themselves, or fall back on a converter anyone can supply.

If the results of that work is then usable by all the real-arch mods, all the better. [mod_name] could be a common dependency mod for all the 'real arch' addons, and contain not just the jvm code to make this happen, but also a library of adapters with consistent semantics. Even better consistency than you can expect if the component devs make their interfaces

themselves.

If you intend to proceed with the proposal in the OP, I would recommend starting a new document, setting up an example of a simple component suitable for a plain but full mmio interface, how its use flows across the different functions you're asking them to add, and how to implement it. Then continue with when IRQs are useful and how to do it, when DMA is useful, and how to use it.

This should then make it crystal clear what you're asking of every component dev ever. Even those that do not care about the internal life of misc archs. If done well, it'll be the only guide a peridev ever needs. I suspect it will also reveal some unnecessary complexity you're moving all the way to the component, that doesn't have to be.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/MightyPirates/OpenComputers/issues/1730#issuecomment-216841701

MaHuJa commented 8 years ago

Let's examine my assumptions.

I was viewing all of this from the component POV. For any component that provides any api at all, possibly except those part of an arch mod, I expect it will provide the lua-compatible api. Every component I've seen to date fullfills that. Then I expect that every 'real arch' anybody is trying to emulate under OC, supports mmio, and so this is a natural next step for a component. Heck, even redpower 2 control relied on it. With those two in place, you can start dealing with DMA, for use with archs that support that.

Anything particularly wrong so far?

When you say "an arch that doesn't want to do MMIO", are you talking about the software you want to run on that arch, or that the arch itself isn't capable of it? Your wording is somewhere in between, and this is a pretty big distinction.

Are you talking specifically about my barebones proposal? My point was that any change beyond that should not be taken lightly, and should come with solid reasons. Supporting DMA will naturally require more from it, as I made clear below. So the real DMA question is: How much of the dma machinery has to be in the component, rather than the arch? Or even a midpoint between arch and component?

The main point of my text bears repeating: The proposed api was unlikely to get off the ground, let alone fly. Because it puts too much of a burden on the component devs who will likely not even feel invested in making these changes in the first place. It's going to be hard enough to have them make hard apis in the first place. To make them implement all the * you see in OP? Nope. Not gonna happen.

If you're fine doing hard apis against only a few 'core' components, and use the lua api (clumsy as we all agree it is) for everything else, then OP just might work for you. But I'll still say you blocked better solutions.

iamgreaser commented 8 years ago

Not amazingly awake right now so I can't address much here.

Main motivation for the locking stuff is because of the way components work - currently you address them by UUID, and more than one computer can be using them. While it could be dropped from the spec, it is likely that the component will still have to provide a facility but this time it's now component-dependent.

The reason for not doing memread and memwrite as you propose is because how I propose it is closer to how real CPUs work... and also your proposal to use a byte array means you have to serialise then deserialise via Java memory anyway, which is slower than ANDing, considering that if you read 32 bits, you want to read the whole 32 bits.

Yes, hardbus is shamelessly 32-bit.

Yes, you do have to have some idea on how hardware actually works to implement it.

But if it's too hard, feel free to wait for someone to come along and propose an API for you and help you out with it.

S3 has a proposal which would be worth looking into, it's inspired by the ATM protocol and treats components as a network, but I don't think he's properly published it yet. He can be found on IRC.

ds84182 commented 8 years ago

Woo, I must have been asleep when I wrote that last message because I managed to misread everything. Sorry @MaHuJa.

Anyways, on a less stupid note... how would any of this actually factor into component communication? Where are the component addresses specified? How is lookup performed (a lot of small reads and writes over MMIO and DMA could trash performance)? What about callback costs?

MaHuJa commented 8 years ago

My ability to give proper feedback here is somewhat limited by my lack of knowledge on the exact mechanics of DMA. The delay before this response was because I was intending to rectify that, but other things have eaten my time. Parts of the response was written back then.

But I do believe my main point stands regardless, that requiring the api suggested on each component is simply not going to work.

How feasible is the proposal of a separate "hardbus adapters" mod, whose responsibility is to provide a (handcrafted) hardbus interface to softbus apis? At that point, you no longer rely on the ability and willingness of component devs, so you can actually afford complex "component" (rather, proxy) apis like the one proposed (here).

how would any of this actually factor into component communication?

It doesn't. My intent is that this piggybacks on the usual component api to find the component to write to, leaving the component a minimum of stuff to add. If the usual component api reaches all the way into the emulated environment is completely up to the arch. But the binding between arch and component would happen by the usual methods the lua implementation uses.

How is lookup performed (a lot of small reads and writes over MMIO and DMA could trash performance)?

The arch will probably want to cache what components map to what mmio addresses, etc. The component doesn't care. That's my point in a nutshell. Beyond "here's what I'll do when you write to the 3rd byte in my mmio area." the component shouldn't have to care about how you find it, it'll just listen and do as you tell it.

Another approach to explaining my point is that there are, say, 100 component types when you add in all the addons. There's one "oc core", and we're currently looking at 3 "real arch"s to my knowledge. If you can change the 100x work to a minimum, the total work required drops immensely. This is especially important when a lot of that 100x work will fall on others who don't have a big incentive to do it, and then maintain it, but you still need it done.

My reading of the spec presented is basically the opposite. It moves complexity to where you have to do it over and over and ~98 more times over.

It's not hard to understand where it comes from - the spec is written by arch author(s). Their (your) natural inclination is to get complexity out of where you have to deal with it. But in handing it off like this, you're setting the whole of it up for failure.

motivation for the locking stuff is because of the way components work - currently you address them by UUID, and more than one computer can be using them.

Address lookup: "address by uuid" brought up in relation to the lock... - are you locking it and using the lock object instead of caching the reference the uuid gets you?

As for anti-sharing, out of all the reasons I can see for a lock, the only one that makes sense (as specified) is as an alternative to making the read/write functions synchronized (or does java use "volatile" there too?) in case of two threads (different computers) trying to accessing it at the same time. But then I see this:

While an architecture is not required to claim the bus before use, it is recommended. Trivial queries over the MMIO bus should be OK.

How does the arch (which has to make this decision) know if something is a "trivial query"? It would need to consult with the component. Exception throwing on attempting unlocked non-trivial ops? (If you can reliably detect that.) Unlocked calls first calling mmio_write_is_trivial(addr,mask) (api creep alert!) to check if it's ok if I do the call itself? Perhaps rather mmio_write_trivial(addr,mask,value) but creepy api still applies.

I have a design for a 24-byte mmio api for the internet card. (I should write it down somewhere.) There's exactly one "trivial" operation as far as running non-synchronized goes. (I realize the internet card is less likely to be shared in such a manner, but I don't think this is an exception.)

and also your proposal to use a byte array

I was not proposing to use it. I was saying "you need to justify not being this simple".

And you answered about half of it, related to smaller sizes.

In particular, I want to highlight dma io as calls into r/w functions rather than ... a design for those who especially care. (=not the majority of component devs.) Then the arch (or converter/proxy) can take care of all that's complicated about dma leaving the component maker to spend his time on what we all really care about.

But if it's too hard, feel free to wait for someone to come along and propose an API for you and help you out with it.

A good component dev will then consider the fact that he'll likely have to maintain it on his own. (A mediocre one will discover that when it breaks. Typically after he released a broken version.)

iamgreaser commented 8 years ago

How does the arch (which has to make this decision) know if something is a "trivial query"?

They read the docs, and then deduce that what they are doing will not cause any side-effects.

The nature of this proposal is if you want to know how to do things, you need to read the component's hardbus documentation. If you don't like this, well, that's how actual hardware goes - without docs, all you have left is reverse-engineering pre-existing code.

An example of a trivial query would be reading status flags over MMIO, assuming said status flags don't automatically clear bits. DMA queries are "nontrivial", and thus you should lock the bus.

Bus locking could possibly be handled by OC. To be blunt, it probably should. It'd then reduce the locking part of the API in the component down to something like:

void hardbusDropLock();

which will tell the component that it needs to drop all DMA and IRQ channels. Architectures will still use the (at the time of writing) 4 methods to lock the components.

As for DMA and IRQ in the instance where these facilities are not provided, an abstract class would most likely be provided with the appropriate dummy methods.

As for anti-sharing, out of all the reasons I can see for a lock, the only one that makes sense is as an alternative to making the read/write functions synchronized in case of two threads (different computers) trying to accessing it at the same time.

That is exactly what it's for.

The arch will probably want to cache what components map to what mmio addresses, etc. The component doesn't care. That's my point in a nutshell. Beyond "here's what I'll do when you write to the 3rd byte in my mmio area." the component shouldn't have to care about how you find it, it'll just listen and do as you tell it.

This is one thing where you'll probably agree that this spec does things right - it is up to the architecture to map the component into the address space.

I'll slightly reword this, as the actual CPU items don't matter:

Another approach to explaining my point is that there are, say, 100 component types when you add in all the addons. There's currently 3 "real arch"s to my knowledge.

My reading of the spec presented is basically the opposite. It moves complexity to where you have to do it over and over and ~98 more times over.

Here are the alternatives:

Map everything to a softbus API. This is what is currently in place. There will continue to be a softbus API, it's just that for at least some of the built-in OC components it would make sense to have a hardbus API as well (e.g. the GPU).
Make hardbus APIs for each component on each architecture, thus duplicating the work 300x over instead of 100x over, and having 3 different APIs for each component.
Change the main API for architectures. There is a proposal for this.

I completely refuse to do number 2 as it's absolutely disgusting. If that is not what you are proposing, then what you are proposing is what already happens, which is cumbersome enough to warrant the hardbus proposal.

gjgfuj commented 8 years ago

Would a byte array with a required callback before and after accessing it (with a parameter saying where in the array) not work?

On Thu, 26 May 2016, 5:27 AM MaHuJa notifications@github.com wrote:

My ability to give proper feedback here is somewhat limited by my lack of knowledge on the exact mechanics of DMA. The delay before this response was because I was intending to rectify that, but other things have eaten my time. Parts of the response was written back then.

But I do believe my main point stands regardless, that requiring the api suggested on each component is simply not going to work.

How feasible is the proposal of a separate "hardbus adapters" mod, whose responsibility is to provide a (handcrafted) hardbus interface to softbus apis? At that point, you no longer rely on the ability and willingness of component devs, so you can actually afford complex "component" (rather,

proxy) apis like the one proposed (here).

how would any of this actually factor into component communication?

It doesn't. My intent is that this piggybacks on the usual component api to find the component to write to, leaving the component a minimum of stuff to add. If the usual component api reaches all the way into the emulated environment is completely up to the arch. But the binding between arch and component would happen by the usual methods the lua implementation uses.

How is lookup performed (a lot of small reads and writes over MMIO and DMA could trash performance)?

The arch will probably want to cache what components map to what mmio addresses, etc. The component doesn't care. That's my point in a nutshell. Beyond "here's what I'll do when you write to the 3rd byte in my mmio area." the component shouldn't have to care about how you find it, it'll just listen and do as you tell it.

Another approach to explaining my point is that there are, say, 100 component types when you add in all the addons. There's one "oc core", and we're currently looking at 3 "real arch"s to my knowledge. If you can change the 100x work to a minimum, the total work required drops immensely. This is especially important when a lot of that 100x work will fall on others who don't have a big incentive to do it, and then maintain it, but you still need it done.

My reading of the spec presented is basically the opposite. It moves complexity to where you have to do it over and over and ~98 more times over.

It's not hard to understand where it comes from - the spec is written by arch author(s). Their (your) natural inclination is to get complexity out of where you have to deal with it. But in handing it off like this, you're setting the whole of it up for

failure.

motivation for the locking stuff is because of the way components work - currently you address them by UUID, and more than one computer can be using them.

Address lookup: "address by uuid" brought up in relation to the lock... - are you locking it and using the lock object instead of caching the reference the uuid gets you?

As for anti-sharing, out of all the reasons I can see for a lock, the only one that makes sense (as specified) is as an alternative to making the read/write functions synchronized (or does java use "volatile" there too?) in case of two threads (different computers) trying to accessing it at the same time. But then I see this:

While an architecture is not required to claim the bus before use, it is recommended. Trivial queries over the MMIO bus should be OK.

How does the arch (which has to make this decision) know if something is a "trivial query"? It would need to consult with the component. Exception throwing on attempting unlocked non-trivial ops? (If you can reliably detect that.) Unlocked calls first calling mmio_write_is_trivial(addr,mask) (api creep alert!) to check if it's ok if I do the call itself? Perhaps rather mmio_write_trivial(addr,mask,value) but creepy api still applies.

I have a design for a 24-byte mmio api for the internet card. (I should write it down somewhere.) There's exactly one "trivial" operation as far as running non-synchronized goes. (I realize the internet card is less likely to be shared in such a manner,

but I don't think this is an exception.)

and also your proposal to use a byte array

I was not proposing to use it. I was saying "you need to justify not being this simple".

And you answered about half of it, related to smaller sizes.

In particular, I want to highlight dma io as calls into r/w functions rather than ... a design for those who especially care. (=not the majority of component devs.) Then the arch (or converter/proxy) can take care of all that's complicated about dma leaving the component maker to spend his time

on what we all really care about.

But if it's too hard, feel free to wait for someone to come along and propose an API for you and help you out with it.

A good component dev will then consider the fact that he'll likely have to maintain it on his own. (A mediocre one will discover that when it breaks. Typically after he released a broken version.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/MightyPirates/OpenComputers/issues/1730#issuecomment-221680732

MaHuJa commented 8 years ago

Make hardbus APIs for each component on each architecture, thus duplicating the work 300x over instead of 100x over, and having 3 different APIs for each component.

I agree that this bad. And that is why I'm puzzled as to why you apparently want to do it:

They read the docs, and then deduce that what they are doing will not cause any side-effects.

Unless all archs enforce the use of explicit cpu instructions to lock and unlock bus/component, rather than just reading/writing to the mapped memory, this is something the arch, not the user, has to cover. (Thus "they" becomes "the arch authors".)

If that is not what you are proposing, then what you are proposing is what already happens,

Nope. I started out not proposing anything, but pointing out a design flaw. Please check my assumption: The OP specification was meant to be implemented on the component side, meaning that to have hardbus APIs available for e.g. computronics components, it would need to be implemented in computronics. If my assumption was wrong, then I'm sorry to have raised a storm in a glass of water over the misunderstanding, and would like to know what the actual intent was.

The one real proposal I did make, has not been addressed. Even indirectly.

Option 4: Make a 'hardbus' addon which provides hardbus interfaces, as proposed here, and translates (behind the scenes) into softbus api calls on the component itself. Then all the hardbus using archs can use this common interface.

The OP specification will no longer be too heavyweight if applied against option 4.
This brings the workload scaling back from the 300x to the minimal 100x.
All archs using this hardbus addon will reuse the same component APIs.
Most importantly, we can have hardbus access to any component, even where the component author has chosen not to implement it. (Somebody has to implement it, but now anyone* can.) This way it can become feasible to access everything by hardbus. Everything.

It replaces the 'disgusting' drawbacks of option 2, while keeping most of the advantages to its approach. It would be roughly equivalent to one arch having done 2, then making it available as a library for all archs. The most prominent drawback to option 4 is the added dependency on this 'hardbus' addon. This can be mitigated in a few ways (from being part of OC, to using the newest of many embedded versions present when missing), but before we can discuss that we need to consider option 4 in the first place.

SaphireLattice commented 8 years ago

Uh, I might be reading everything wrong, but the option of throwing all the job at arch authors is just what exists right now. From what I've seen, all ”real” arches use some kind of (somewhat universal I guess) interface that they implement by themselves. I am just pointing that out, not trying to make some conclusion, though.

BTW, could anyone make a review of all the options there are for now?

And the ”make an add-on” idea doesn't sound good. That would require installation of additional mod of someone will want to use an arch... And distributing this add-on along with OC is just basically making it an API inside of OC, no?

On Thu, May 26, 2016, 20:03 MaHuJa notifications@github.com wrote:

Make hardbus APIs for each component on each architecture, thus duplicating the work 300x over instead of 100x over, and having 3 different APIs for each component.

I agree that this bad. And that is why I'm puzzled as to why you apparently want to do it:

They read the docs, and then deduce that what they are doing will not cause any side-effects.

Unless all archs enforce the use of explicit cpu instructions to lock and unlock bus/component, rather than just reading/writing to the mapped memory, this is something the arch, not the user, has to cover. (Thus

"they" becomes "the arch authors".)

If that is not what you are proposing, then what you are proposing is what already happens,

Nope. I started out not proposing anything, but pointing out a design flaw. Please check my assumption: The OP specification was meant to be implemented on the component side, meaning that to have hardbus APIs available for e.g. computronics components, it would need to be implemented in computronics. If my assumption was wrong, then I'm sorry to have raised a storm in a glass of water over the misunderstanding, and would like to know what the actual intent was.

The one real proposal I did make, has not been addressed. Even indirectly.

Option 4: Make a 'hardbus' addon which provides hardbus interfaces, as proposed here, and translates (behind the scenes) into softbus api calls on the component itself. Then all the hardbus using archs can use this common interface.

The OP specification will no longer be too heavyweight if applied against option 4.

This brings the workload scaling back from the 300x to the minimal 100x.

All archs using this hardbus addon will reuse the same component APIs.

Most importantly, we can have hardbus access to any component, even where the component author has chosen not to implement it. (Somebody has to implement it, but now anyone* can.) This way it can become feasible to access everything by hardbus. Everything.

It replaces the 'disgusting' drawbacks of option 2, while keeping most of the advantages to its approach. It would be roughly equivalent to one arch having done 2, then making it available as a library for all archs. The most prominent drawback to option 4 is the added dependency on this 'hardbus' addon. This can be mitigated in a few ways (from being part of OC, to using the newest of many embedded versions present when missing), but before we can discuss that we need to consider option 4 in the first place.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/MightyPirates/OpenComputers/issues/1730#issuecomment-221864338

iamgreaser commented 8 years ago

Is Option 4 suggesting that it be possible to let addons provide hardbus interfaces for other addons? That is, are you suggesting that a hardbus component class can be registered as a wrapper for a softbus component?

MaHuJa commented 8 years ago

Pretty much.

High on my wishlist for these projects is being able to use hardbus type interfaces for practically everything. If the hardbus interface has to reside in the original (softbus) component, and it requires implementing an api like the one discussed in the first post, that's not going to happen.

On the other hand, if this can be implemented elsewhere, then there's nothing that prevents 'all hardbus' from being at least theoretically possible.

dsolmann commented 4 years ago

It is gone :(