Move RPMSG buffer address translation out of VirtIO layer

glneo commented 3 months ago

The allocation and use of the buffers used for passing RPMSGs contents is all handled by the RPMSG layer (rpmsg_virtio.c). The VirtIO layer should only need to deal with guest-physical addresses, as that is what the standard requires stored inside the vring descriptors.

Move the address translations steps out of the VirtIO layer and up into the RPMSG layer. This moves that handling where it belongs and allows for later improvements to both layers that could not be done with this unneeded interdependence.

arnopo commented 3 months ago

Could you provide more details on what you try to fix here. Your update would impose the PA<->VA conversion in each virtio driver/device.

I don't undertsand why the virtio layer can not deal with VA and make the conversion to store in the vrings descriptor. Looking in Linux, seems that scatterlist is used at Virtio API, so similar approach based on page

@danmilea, any opinion on this proposal?

glneo commented 3 months ago

Could you provide more details on what you try to fix here.

Sure, the general idea is cleaner separation of responsibility for each layer. More details on the "why" below.

Your update would impose the PA<->VA conversion in each virtio driver/device.

VirtIO never deals with VA for the messages, the virt queue descriptors store PAs, this is described in the standard. So why should the VirtIO layer convert them to VAs, or even assume that they must be converted at all?

Yes it means that the users of the VirtIO layer (which is just the RPMSG layer today) will have to do that conversion if they need to work with the contents of the buffer. But some users might not need that conversion at all, they may want the PA, for instance in the case of a proxy they might want to take that PA and put it right back into another virt queue descriptor, no translation wanted.

I don't understand why the virtio layer can not deal with VA and make the conversion to store in the vrings descriptor.

So here is the actual problem I am solving with this: in the general case we cannot know how to do the PA<->VA conversion. Today the assumption made by OpenAMP's VirtIO layer is that all the message buffers will come from one single IO region (shm_io). This is an bad assumption. Only when the RPMSG layer on top is the HOST role do we usually allocate all the message buffers from one IO region that we know about up front (the shm_pool). When the RPMSG layer is the REMOTE, then the other side does the allocations, and they can be from anywhere. Luckily this issue can be solved thanks to the HOST side putting all the buffer locations in the descriptor queues up front when building the tables. But I have to implement that solution in two parts, this is the first, move the translation out of the VirtIO layer (done here). The second part is to dynamically build the IO regions from these pre-setup descriptor queues when we are the REMOTE side.

Looking in Linux, seems that scatterlist is used at Virtio API, so similar approach based on page

Scatterlists do not deal with VA either, they deal with DMA addresses (or device physical addresses) exactly like I do here. If we want to match more closely then the VirtIO layer would deal with offsets (pages), which is okay also, anything but VA works here.

arnopo commented 3 months ago

Could you provide more details on what you try to fix here.

Sure, the general idea is cleaner separation of responsibility for each layer. More details on the "why" below.

Your update would impose the PA<->VA conversion in each virtio driver/device.

VirtIO never deals with VA for the messages, the virt queue descriptors store PAs, this is described in the standard. So why should the VirtIO layer convert them to VAs, or even assume that they must be converted at all?

Yes it means that the users of the VirtIO layer (which is just the RPMSG layer today) will have to do that conversion if they need to work with the contents of the buffer. But some users might not need that conversion at all, they may want the PA, for instance in the case of a proxy they might want to take that PA and put it right back into another virt queue descriptor, no translation wanted. Does the metal io_region with PA =VA is not sufficient? For instance for zephyr is working is this config.

I don't understand why the virtio layer can not deal with VA and make the conversion to store in the vrings descriptor.

So here is the actual problem I am solving with this: in the general case we cannot know how to do the PA<->VA conversion. Today the assumption made by OpenAMP's VirtIO layer is that all the message buffers will come from one single IO region (shm_io). This is an bad assumption. Only when the RPMSG layer on top is the HOST role do we usually allocate all the message buffers from one IO region that we know about up front (the shm_pool). When the RPMSG layer is the REMOTE, then the other side does the allocations, and they can be from anywhere. Luckily this issue can be solved thanks to the HOST side putting all the buffer locations in the descriptor queues up front when building the tables. But I have to implement that solution in two parts, this is the first, move the translation out of the VirtIO layer (done here). The second part is to dynamically build the IO regions from these pre-setup descriptor queues when we are the REMOTE side.

The io_region is just a structure that was created to support complex memory mapping and ensure that you work we expected memory region. So in theory, no problem to initialize it with the entire memory range. Do you really need to have several shm_io? It is not clear for me... If you dynamically build the IO regions how do you check the validity of this regions? i suppose that you will need to check validity regarding a local memory regions. This is what metal_io is doing, The difference is thatwe have to declare the memory region first.

Looking in Linux, seems that scatterlist is used at Virtio API, so similar approach based on page

Scatterlists do not deal with VA either, they deal with DMA addresses (or device physical addresses) exactly like I do here. If we want to match more closely then the VirtIO layer would deal with offsets (pages), which is okay also, anything but VA works here.

My point here was that it is not directly the physical address at API level but something else with a conversion(page_to_phys()) done in virtio ring. Anyway I think an important usecase we have to deal with, is the use of the open-amp lib in Linux User Space. How to manage openamp with HOST role to communicate with the remote processor. what ever the the virtio type, the application will allocate buffers so will need the VA<->PA conversion. As we have the ambition to extend the list of Virtio services, we should find a solution that match (as much as possible) to the virtio services, not just RPMSG. Having the conversion in the virtio_ring, would avoid conversion duplication in all virtio service driver/devices

glneo commented 3 months ago

The io_region is just a structure that was created to support complex memory mapping and ensure that you work we expected memory region. So in theory, no problem to initialize it with the entire memory range. Do you really need to have several shm_io?

One metal_io_region that covers the whole memory space might be workable, the root issue of building that region is the same.

It is not clear for me... If you dynamically build the IO regions how do you check the validity of this regions? i suppose that you will need to check validity regarding a local memory regions. This is what metal_io is doing, The difference is that we have to declare the memory region first.

That is what I am solving, today we have to have this magic shm_io region that can handle any and all buffer addresses that might be sent. Currently this means we have know the addresses up-front and hard-code them. No matter what we do to solve that, the problem should not be the VirtIO layer's problem as it by design only deals with rings and descriptors. The standard even specifies this by stating the descriptors shall only store physical addresses when address are passed, making sure any needed address translations would not be a vring problem.

Anyway I think an important usecase we have to deal with, is the use of the open-amp lib in Linux User Space. How to manage openamp with HOST role to communicate with the remote processor. what ever the the virtio type, the application will allocate buffers so will need the VA<->PA conversion.

Not sure what this PR changes about that. The application passes in a shm_io that can handle those translations to rpmsg_init_vdev() just like before. The only difference is the rpmsg_virtio layer doesn't pass this down into the virtio layer, instead the same call to metal_io_virt_to_phys() is done in the rpmsg_virtio layer. This PR is transparent to the application.

As we have the ambition to extend the list of Virtio services, we should find a solution that match (as much as possible) to the virtio services, not just RPMSG. Having the conversion in the virtio_ring, would avoid conversion duplication in all virtio service driver/devices.

If the concern is avoiding code duplication there a plenty of ways to solve that. One would be to provide helper functions (not really needed IMHO though since it is just one function call metal_io_virt_to_phys()). Or another option would be a layer on top of the virtio layer that converts the message addresses if your use-case needs translation.

The thing to not do is what we do today and assume that since the one current user (RPMSG) needs it, then all users will always need translation and make the virtio layer do that unconditionally. Which actually adds code (notice this PR removes more lines than it adds :smiley:), all to avoid code duplication that might exist some day.

arnopo commented 2 months ago

The io_region is just a structure that was created to support complex memory mapping and ensure that you work we expected memory region. So in theory, no problem to initialize it with the entire memory range. Do you really need to have several shm_io?

One metal_io_region that covers the whole memory space might be workable, the root issue of building that region is the same.

It is not clear for me... If you dynamically build the IO regions how do you check the validity of this regions? i suppose that you will need to check validity regarding a local memory regions. This is what metal_io is doing, The difference is that we have to declare the memory region first.

That is what I am solving, today we have to have this magic shm_io region that can handle any and all buffer addresses that might be sent. Currently this means we have know the addresses up-front and hard-code them. No matter what we do to solve that, the problem should not be the VirtIO layer's problem as it by design only deals with rings and descriptors. The standard even specifies this by stating the descriptors shall only store physical addresses when address are passed, making sure any needed address translations would not be a vring problem.

In vrings descriptor we must have the physical address, but I don't see anything in the spec that imposes that we can not have memory translation in the virtio transport driver/device. Could you point it to me?

More that that we can need translation address, In AMP system, between the main processor (physical address) and the coprocessor (device address). A virtio drivers physical address is written in the descriptor and a translation address can be needed on device side. So the notion of physical adresse in such case need to be clarified. One concrete example is the RETRAM memory in stm32mp15. The coprocessor base address is 0x00000000 while the main processor base adresss is 0x38000000.

Anyway I think an important usecase we have to deal with, is the use of the open-amp lib in Linux User Space. How to manage openamp with HOST role to communicate with the remote processor. what ever the the virtio type, the application will allocate buffers so will need the VA<->PA conversion.

Not sure what this PR changes about that. The application passes in a shm_io that can handle those translations to rpmsg_init_vdev() just like before. The only difference is the rpmsg_virtio layer doesn't pass this down into the virtio layer, instead the same call to metal_io_virt_to_phys() is done in the rpmsg_virtio layer. This PR is transparent to the application.

As we have the ambition to extend the list of Virtio services, we should find a solution that match (as much as possible) to the virtio services, not just RPMSG. Having the conversion in the virtio_ring, would avoid conversion duplication in all virtio service driver/devices.

If the concern is avoiding code duplication there a plenty of ways to solve that. One would be to provide helper functions (not really needed IMHO though since it is just one function call metal_io_virt_to_phys()). Or another option would be a layer on top of the virtio layer that converts the message addresses if your use-case needs translation.

The thing to not do is what we do today and assume that since the one current user (RPMSG) needs it, then all users will always need translation and make the virtio layer do that unconditionally. Which actually adds code (notice this PR removes more lines than it adds 😃), all to avoid code duplication that might exist some day.

Some works are already started today around virtio services : https://github.com/OpenAMP/open-amp/tree/virtio-exp-v2023.10/lib/virtio_mmio

Every virtio service should manipulate buffer. It can be internally with a copy or external by adding/getting allocated buffers in descriptor.

My concern is more at application level. I want to be sure that for all virtio services, independently of the system we have the same way of managing the buffers at virtio service API. From my perspective the more generic/flexible is the VA. and in this case we need virt to phys translations.

Anyway this topic is not the simple, It would be nice to have more opinions to decide in which direction we move forward.

glneo commented 2 months ago

In vrings descriptor we must have the physical address, but I don't see anything in the spec that imposes that we can not have memory translation in the virtio transport driver/device. Could you point it to me?

How we implement our drivers is of course up to us. My point was that the spec was written to give us the freedom of implementing the driver with layers that split responsibilities. Which is what this PR is suggesting we do.

More that that we can need translation address, In AMP system, between the main processor (physical address) and the coprocessor (device address). A virtio drivers physical address is written in the descriptor and a translation address can be needed on device side. So the notion of physical adresse in such case need to be clarified. One concrete example is the RETRAM memory in stm32mp15. The coprocessor base address is 0x00000000 while the main processor base adresss is 0x38000000.

We have the same in our devices, for instance the M4 cores in AM625 see their TCM at 0x0 but we program them at some higher address from the "main" processors perspective. Even more complex, we have a RAT (Region-based Address Translator) and IOMMU before DRAM. Which means we might have 3(!) levels of translation before hitting the actual buffer in memory. So what we put in the descriptor table as the messages' "physical address" is not well defined at all. Hence why I want to pull that messy translation logic out of the virtio driver.

Every virtio service should manipulate buffer. It can be internally with a copy or external by adding/getting allocated buffers in descriptor.

My concern is more at application level. I want to be sure that for all virtio services, independently of the system we have the same way of managing the buffers at virtio service API. From my perspective the more generic/flexible is the VA. and in this case we need virt to phys translations.

I would also like to see an application level API that is independent of the underlying system. And yes most users may need VA<->PA translations, but hiding these translations in the virtio layer as done today only works for some system. Applications still work with VAs, all that changes is where the translation happens after they pass the VA into the framework.

Right now at least, the change in this PR makes no change to the end applications. If someday an application does want to directly interact with the virtio layer, I would think it should also want the extra level of control over buffer allocations and translations provided by this PR.

glneo commented 2 months ago

Continuing to think on this, I wonder if the responsibility for all these buffer allocations should actually belong to the transport layer, not the message layer.. I'll have to think on what that looks like, might be and even cleaner solution there than this PR. (Either way, doesn't belong to the virtio layer, just need to figure out which direction it should go and how to do that)

xiaoxiang781216 commented 2 months ago

Continuing to think on this, I wonder if the responsibility for all these buffer allocations should actually belong to the transport layer, not the message layer.. I'll have to think on what that looks like, might be and even cleaner solution there than this PR. (Either way, doesn't belong to the virtio layer, just need to figure out which direction it should go and how to do that)

We think so and @CV-Bowen provide a patch here: https://github.com/OpenAMP/open-amp/pull/541 A more flexible buffer allocation scheme is required, if we want to support non remoteproc based transport on top of OpenAMP virtio layer(e.g. virtio and vhost).

arnopo commented 2 months ago

Continuing to think on this, I wonder if the responsibility for all these buffer allocations should actually belong to the transport layer, not the message layer.. I'll have to think on what that looks like, might be and even cleaner solution there than this PR. (Either way, doesn't belong to the virtio layer, just need to figure out which direction it should go and how to do that)

We think so and @CV-Bowen provide a patch here: #541 A more flexible buffer allocation scheme is required, if we want to support non remoteproc based transport on top of OpenAMP virtio layer(e.g. virtio and vhost).

Full aligned with that. we should have some alloc/free/register/unregister ops. Either application allocates buffers or register them, if allocated by another entity. One constraint is to be continue to support static allocation for some systems. but this can be solved by a pre allocated memory area for allocations as we have today with the shmem io.

glneo commented 2 months ago

No matter which way we solve the shared message buffer issue, this PR is still valid and needed.

arnopo commented 1 month ago

No matter which way we solve the shared message buffer issue, this PR is still valid and needed.

At this stage, the need for existing code is not evident. With the discussions around the allocator, I would be of the opinion to make sure that we are moving in the right direction.

Having your Step 2 code would help to provide a more comprehensive picture:

I have to implement that solution in two parts, this is the first, move the translation out of the VirtIO layer (done here). >The second part is to dynamically build the IO regions from these pre-setup descriptor queues when we are the REMOTE side.

We should also add the topic at the agenda of next virtio meeting to have more options

glneo commented 1 month ago

Having your Step 2 code would help to provide a more comprehensive picture

Fair enough, I can post an RFC style PR with what I am thinking for the dynamic allocations step.

github-actions[bot] commented 4 days ago

This pull request has been marked as a stale pull request because it has been open (more than) 45 days with no activity.

OpenAMP / open-amp

Move RPMSG buffer address translation out of VirtIO layer #575