rpmsg: add release cb and refcnt in endpoint to fix ept used-after-free

OpenAMP / open-amp

The main OpenAMP library implementing RPMSG, Virtio, and Remoteproc for RTOS etc

https://www.openampproject.org/

Other

706 stars 288 forks source link

rpmsg: add release cb and refcnt in endpoint to fix ept used-after-free #508

Closed yintao707 closed 10 months ago

yintao707 commented 12 months ago

issue description:

4nOaD0QVZj

In our case, therpmsg_virtio_rx_callback() is called in a rpmsg thread. Let's assume a situation, when my_cbhas already got the ept in rpmsg thread, user service thread just right called my_deinit to excute rpmsg_destroy_ept and released ept, at this time, the execution of ept ->cb has not been completed in rpmsg thread. so , there is a used after free about the ept.

Assuming we choose to fix this situation at the user level, we can continue to refer to the following examples:

void my_cb(...)
{
   atomic_fetch_add(&g_priv->refcnt, 1); // 2*
   ....
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) {
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}

void my_init(void)
{
  g_priv = malloc(sizeof(*g_priv));
  atomic_init(&g_priv->refcnt, 1);
  g_priv->ept.cb = my_cb;
  rpmsg_create_ept(&g_priv->ept);
}

void my_deinit(void)
{
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) { // 1*
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}

rpmsg_virtio_rx_callback release lock here: https://github.com/OpenAMP/open-amp/blob/cd8823876fb6920571abbeecc570a9da0cb546ba/lib/rpmsg/rpmsg_virtio.c#L561C1-L561C36
os decide to suspend rpmsg_virtio_rx_callback and switch to my_deinit
my_deinit finish execution and release g_priv(1*)
rpmsg_virtio_rx_callback resume and then happen use-after-free

The root cause is that OpenAMP call cb after release lock Therefore, reference counting in the application layer cannot solve this race condition. Therefore, to avoid race condition, we added refnt to the endpoint and call the release callback when ept callback finished

arnopo commented 11 months ago

Hello @yintao707

Your issue is not clear for me. Please, could you detail your issue ? what do you mean by "there is a used after free about the ept,"

Seems to me that you can call rpmsg_destroy_ept() at the end of your endpoint callback. The only constraint would be that you don't use the endpoint in your callback after the destroy.

yintao707 commented 11 months ago

hi @arnopo , The problems currently encountered are as follows: static void rpmsg_virtio_rx_callback(struct virtqueue *vq) { ... while (rp_hdr) { rp_hdr->reserved = idx;

    /* Get the channel node from the remote device channels list. */
    metal_mutex_acquire(&rdev->lock);
    ept = rpmsg_get_ept_from_addr(rdev, rp_hdr->dst);
    metal_mutex_release(&rdev->lock);

    if (ept) {
        if (ept->dest_addr == RPMSG_ADDR_ANY) {
            /*
             * First message received from the remote side,
             * update channel destination address
             */
            ept->dest_addr = rp_hdr->src;
        }
        status = ept->cb(ept, RPMSG_LOCATE_DATA(rp_hdr),
                 rp_hdr->len, rp_hdr->src, ept->priv);

        RPMSG_ASSERT(status >= 0,
                 "unexpected callback status\r\n");
    }
    ...
}

} When the ept->cb is being executed, if other threads of the service release ept->priv, then use-afer-free will occur if the callback continues to execute; if rpsmg_unregister_endpoint can synchronously wait for ept->cb to be executed before ending, then the above problem will not occur. however there is no restriction on this in the current code. So I added refcnt to avoid this issue.

static void rpmsg_unregister_endpoint(struct rpmsg_endpoint ept) { struct rpmsg_device rdev = ept->rdev;

metal_mutex_acquire(&rdev->lock);
if (ept->addr != RPMSG_ADDR_ANY)
    rpmsg_release_address(rdev->bitmap, RPMSG_ADDR_BMP_SIZE,
                  ept->addr);
metal_list_del(&ept->node);
ept->rdev = NULL;
metal_mutex_release(&rdev->lock);
rpmsg_decref(ept);

}

arnopo commented 11 months ago

When the ept->cb is being executed, if other threads of the service release ept->priv, then use-afer-free will occur if the callback continues to execute; if rpsmg_unregister_endpoint can synchronously wait for ept->cb to be executed before ending, then the above problem will not occur. however there is no restriction on this in the current code. So I added refcnt to avoid this issue.

Something doesn't seem correct to me (if I understood your use case correctly).

In your endpoint callback, you call rpmsg_destroy_ept(ept) to free the endpoint and then continue to use the ept structure. This is equivalent to using a pointer after freeing it.

Could you share your code with me so that I can better understand the issue you are trying to solve with this PR?"

yintao707 commented 11 months ago

When the ept->cb is being executed, if other threads of the service release ept->priv, then use-afer-free will occur if the callback continues to execute; if rpsmg_unregister_endpoint can synchronously wait for ept->cb to be executed before ending, then the above problem will not occur. however there is no restriction on this in the current code. So I added refcnt to avoid this issue.

Something doesn't seem correct to me (if I understood your use case correctly).

In your endpoint callback, you call rpmsg_destroy_ept(ept) to free the endpoint and then continue to use the ept structure. This is equivalent to using a pointer after freeing it.

Could you share your code with me so that I can better understand the issue you are trying to solve with this PR?"

hi, @arnopo , I have updated the description. Could you please take a look again

arnopo commented 11 months ago

Thank you for clarifying the race condition! Please correct me if I am wrong, but the issue I see here is that the 'user thread service' releases the endpoint by freeing the ept pointer ("release ept"). I don't think it's a good idea to manage the free of this pointer in the library, as it is allocated and freed in the application, and we cannot ensure all application use cases.

For instance, the rpmsg callback can be called under an interrupt context. In such a case, a work can be created in the callback to treat the message in normal context. In such a scenario, your PR would not prevent the ept pointer from being freed between the callback call and the work execution.

Is there any particular reason why you are not using counters or mutexes in your application to prevent race conditions?"

CV-Bowen commented 11 months ago

Thank you for clarifying the race condition! Please correct me if I am wrong, but the issue I see here is that the 'user thread service' releases the endpoint by freeing the ept pointer ("release ept"). I don't think it's a good idea to manage the free of this pointer in the library, as it is allocated and freed in the application, and we cannot ensure all application use cases.

For instance, the rpmsg callback can be called under an interrupt context. In such a case, a work can be created in the callback to treat the message in normal context. In such a scenario, your PR would not prevent the ept pointer from being freed between the callback call and the work execution.

Is there any particular reason why you are not using counters or mutexes in your application to prevent race conditions?"

@arnopo In our case, the rpmsg_virtio_rx_callback() is called in a thread. And even the rpmsg_virtio_rx_callback() is called in interrupt and creates a work to process the message in normal context, we can call rpmsg_incref() first before creating work and call rpmsg_decref() to release the endpoint after the work finish to avoid the endpoint has been released after the ept->cb().

Actually, Linux did the same thing to avoid the endpoint used-after-free issue, this is the commit: https://github.com/torvalds/linux/commit/5a081caa0414b9bbb82c17ffab9d6fe66edbb72f The different between OpenAMP and Linux is: Linux malloc the endpoint inside the virtio rpsmg bus, but in OpenAMP, the endpoint is maintained by the user, so I think we should add a callback ept->release_cb() to let user do this.

arnopo commented 11 months ago

Thank you for clarifying the race condition! Please correct me if I am wrong, but the issue I see here is that the 'user thread service' releases the endpoint by freeing the ept pointer ("release ept"). I don't think it's a good idea to manage the free of this pointer in the library, as it is allocated and freed in the application, and we cannot ensure all application use cases. For instance, the rpmsg callback can be called under an interrupt context. In such a case, a work can be created in the callback to treat the message in normal context. In such a scenario, your PR would not prevent the ept pointer from being freed between the callback call and the work execution. Is there any particular reason why you are not using counters or mutexes in your application to prevent race conditions?"

@arnopo In our case, the rpmsg_virtio_rx_callback() is called in a thread. And even the rpmsg_virtio_rx_callback() is called in interrupt and creates a work to process the message in normal context, we can call rpmsg_incref() first before creating work and call rpmsg_decref() to release the endpoint after the work finish to avoid the endpoint has been released after the ept->cb().

That's my point. Half of the solution is in the library and the other part is in the application. It seems to me that you are creating a open-amp interface to solve a problem in a complex way, which could be addressed with a mutex or acounter in the application in a safer way ( for instance you can face same issue if you free the ept->priv with another thread).

Actually, Linux did the same thing to avoid the endpoint used-after-free issue, this is the commit: torvalds/linux@5a081ca The different between OpenAMP and Linux is: Linux malloc the endpoint inside the virtio rpsmg bus, but in OpenAMP, the endpoint is maintained by the user, so I think we should add a callback ept->release_cb() to let user do this.

Yes, in Linux, this has been implemented because the endpoint allocation and free is also managed by the virtio rpmsg bus.. However, this is not the case for OpenAMP, as a requirement is to support static allocation.

The ept->release_cb() only makes sense with a counter mechanism. Therefore, I don't see a strong reason to complexify the rpmsg API, to partially answer to some memory allocations/free that should be managed by application .

@edmooring, @tnmysh : any opinion on this?

xiaoxiang781216 commented 11 months ago

The ept->release_cb() only makes sense with a counter mechanism. Therefore, I don't see a strong reason to complexify the rpmsg API, to partially answer to some memory allocations/free that should be managed by application .

@CV-Bowen has tried to add refcount inside application, but it can't fix this problem without any race condition, for example:

void my_cb(...)
{
   atomic_fetch_add(&g_priv->refcnt, 1); // 2*
   ....
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) {
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}

void my_init(void)
{
  g_priv = malloc(sizeof(*g_priv));
  atomic_init(&g_priv->refcnt, 1);
  g_priv->ept.cb = my_cb;
  rpmsg_create_ept(&g_priv->ept);
}

void my_deinit(void)
{
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) { // 1*
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}

Let's assume this sequence:

rpmsg_virtio_rx_callback release lock here: https://github.com/OpenAMP/open-amp/pull/508/files#diff-569d5f59c7dc7ac2fee7252d74df8af1ba6e366a30fae7e1b97de627bc0ce472L563
os decide to suspend rpmsg_virtio_rx_callback and switch to my_deinit
my_deinit finish execution and release g_priv(1*)
rpmsg_virtio_rx_callback resume and then happen use-after-free

You can see the reference count inside my_cb can just reduce the race condition(safe after 2), but the gap between item 1 and 2 can just be handled by OpenAMP self.

The root cause is that OpenAMP call cb after release lock, the fix could be:

Add the reference count
Set a busy flag before release lock and wait in rpmsg_destroy_ept if this flag is set.

arnopo commented 11 months ago

Hi @xiaoxiang781216

Thanks! much more simple to understand the use case with code and associated explanations.

The ept->release_cb() only makes sense with a counter mechanism. Therefore, I don't see a strong reason to complexify the rpmsg API, to partially answer to some memory allocations/free that should be managed by application .

@CV-Bowen has tried to add refcount inside application, but it can't fix this problem without any race condition, for example:
void my_cb(...)
{
Here I suppose that you can not test g_priv ponter, else would be simple to exit with a mutex protection
   atomic_fetch_add(&g_priv->refcnt, 1); // 2*
   ....
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) {
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}

void my_init(void)
{
  g_priv = malloc(sizeof(*g_priv));
  atomic_init(&g_priv->refcnt, 1);
  g_priv->ept.cb = my_cb;
  rpmsg_create_ept(&g_priv->ept);
}

void my_deinit(void)
{
  if (atomic_fetch_sub(&g_priv->refcnt, 1) == 1) { // 1*
    rpmsg_destroy_ept(&g_priv->ept);
    free(g_priv);
  }
}
Let's assume this sequence:

rpmsg_virtio_rx_callback release lock here: https://github.com/OpenAMP/open-amp/pull/508/files#diff-569d5f59c7dc7ac2fee7252d74df8af1ba6e366a30fae7e1b97de627bc0ce472L563

os decide to suspend rpmsg_virtio_rx_callback and switch to my_deinit

my_deinit finish execution and release g_priv(1*)

rpmsg_virtio_rx_callback resume and then happen use-after-free

You can see the reference count inside my_cb can just reduce the race condition(safe after 2), but the gap between item 1 and 2 can just be handled by OpenAMP self.

Ok I have understood your issue now. Thanks for pointing it out.

As you mention current patch does no protect between the release of the mutex and the counter increment : https://github.com/OpenAMP/open-amp/pull/508/files#diff-569d5f59c7dc7ac2fee7252d74df8af1ba6e366a30fae7e1b97de627bc0ce472R562-L570

Same issue for the unbind: https://github.com/OpenAMP/open-amp/pull/508/files#diff-569d5f59c7dc7ac2fee7252d74df8af1ba6e366a30fae7e1b97de627bc0ce472R642

The root cause is that OpenAMP call cb after release lock, the fix could be:

Add the reference count

Set a busy flag before release lock and wait in rpmsg_destroy_ept if this flag is set.

One alternative was to have a rpmsg_sync_destroy_ept( cb). But regarding the rpmsg_deinit_vdev function, the release callback you propose seems more flexible .

I will review more closely the patch at the beginning of the week.

@yintao707:

Can you send a version that fix the protection hole, between the metal_mutex_release and the callbacks calls?
I would prefer something else that atomic_* functions in open-amp lib. We regularly face problems with compilers that not implements the atomic library.

xiaoxiang781216 commented 11 months ago

@CV-Bowen has tried to add refcount inside application, but it can't fix this problem without any race condition, for example:
void my_cb(...)
{
Here I suppose that you can not test g_priv ponter, else would be simple to exit with a mutex protection

Yes, if it saves in a global variable and protected by a mutex in this simple demo, but the real case is that the endpoints are created and destroyed with dynamical number instances. Even this simple demo applies the change you suggest, the race condition still exists since OpenAMP access g_priv->ept's fields too before entering my_cb.

yintao707 commented 11 months ago

Can you send a version that fix the protection hole, between the metal_mutex_release and the callbacks calls?

I would prefer something else that atomic_* functions in open-amp lib. We regularly face problems with compilers that not implements the atomic library.

hi @arnopo , Updated, please help review again.thanks

yintao707 commented 11 months ago

Minor comments., else LGTM Please, also update the commit message to better explain the race condition issue, and fix typo errors.

hi， @arnopo ，Thank you very much for your patient review， I have updated the PR based on your modification suggestions. Could you please help me review again， thanks

xiaoxiang781216 commented 11 months ago

LGTM

yintao707 commented 11 months ago

hi, @arnopo , I updated this PR because I found that if the reference count is only handled in the openamp layer, there may be the following issues. Assuming this is my rpmsg_service thread, under normal circumstances, I would increment the reference count in my_init(). If no ept_cb is currently executing, I would release ept->priv in my_deinit().

rpmsg_service thread

void my_init()
{
    ...
    rpmsg_create_ept(ept);
    //ept->ref = 1;
    ...
}

void my_deinit()
{
    ...
    //Assuming ept_cb is not currently executing
    rpmsg_destroy_ept(ept);
        call release_cb(ept);
             ept->refcnt-1=0;
             free(ept->priv);
}

But if there is another thread that actively calls rpmsg_destroy_ept through my_stop_remote, and the previous reference count is 1, after rpmsg_destroy_ept is executed, release_cb will release ept->priv. If the rpmsg_service thread continues to access ept->priv, a problem may occur.

other thread

void my_stop_remote()
{
    /* Traversing all registered services.*/
    metal_list_for_each(g_rp_service, node)
    {
        rpmsg_xxx_device_destroy();
            call rpmsg_destroy_ept(ept);
                 call release_cb free ept->priv;
    }
}

Therefore, I have changed the original release_cb to incref_cb and decref_cb, allowing the service to control the usage of reference counts. This can mitigate the aforementioned problem.

Please @xiaoxiang781216 @arnopo help review this modification，thanks

arnopo commented 11 months ago

hi, @arnopo , I updated this PR because I found that if the reference count is only handled in the openamp layer, there may be the following issues. Assuming this is my rpmsg_service thread, under normal circumstances, I would increment the reference count in my_init(). If no ept_cb is currently executing, I would release ept->priv in my_deinit().
rpmsg_service thread

void my_init()
{
    ...
    rpmsg_create_ept(ept);
    //ept->ref = 1;
    ...
}

void my_deinit()
{
    ...
    //Assuming ept_cb is not currently executing
    rpmsg_destroy_ept(ept);
        call release_cb(ept);
             ept->refcnt-1=0;
             free(ept->priv);
}
But if there is another thread that actively calls rpmsg_destroy_ept through my_stop_remote, and the previous reference count is 1, after rpmsg_destroy_ept is executed, release_cb will release ept->priv. If the rpmsg_service thread continues to access ept->priv, a problem may occur.

No cristal clear to me: "If the rpmsg_service thread continues to access ept->priv, a problem may occur." When the release callback is called this means that no one is using the ept, right? If you set ept->priv to NULL in release callback does it prevent the issue?

other thread

void my_stop_remote()
{
    /* Traversing all registered services.*/
    metal_list_for_each(g_rp_service, node)
    {
        rpmsg_xxx_device_destroy();
            call rpmsg_destroy_ept(ept);
                 call release_cb free ept->priv;
    }
}
Therefore, I have changed the original release_cb to incref_cb and decref_cb, allowing the service to control the usage of reference counts. This can mitigate the aforementioned problem.

Please @xiaoxiang781216 @arnopo help review this modification，thanks

Your last version looks more like a hack to me. If the release_cb does not solve the issue, we probably have to consider another approach

yintao707 commented 11 months ago

hi, @arnopo , I updated this PR because I found that if the reference count is only handled in the openamp layer, there may be the following issues. Assuming this is my rpmsg_service thread, under normal circumstances, I would increment the reference count in my_init(). If no ept_cb is currently executing, I would release ept->priv in my_deinit().
rpmsg_service thread

void my_init()
{
    ...
    rpmsg_create_ept(ept);
    //ept->ref = 1;
    ...
}

void my_deinit()
{
    ...
    //Assuming ept_cb is not currently executing
    rpmsg_destroy_ept(ept);
        call release_cb(ept);
             ept->refcnt-1=0;
             free(ept->priv);
}
But if there is another thread that actively calls rpmsg_destroy_ept through my_stop_remote, and the previous reference count is 1, after rpmsg_destroy_ept is executed, release_cb will release ept->priv. If the rpmsg_service thread continues to access ept->priv, a problem may occur.
No cristal clear to me: "If the rpmsg_service thread continues to access ept->priv, a problem may occur." When the release callback is called this means that no one is using the ept, right? If you set ept->priv to NULL in release callback does it prevent the issue?
other thread

void my_stop_remote()
{
    /* Traversing all registered services.*/
    metal_list_for_each(g_rp_service, node)
    {
        rpmsg_xxx_device_destroy();
            call rpmsg_destroy_ept(ept);
                 call release_cb free ept->priv;
    }
}
Therefore, I have changed the original release_cb to incref_cb and decref_cb, allowing the service to control the usage of reference counts. This can mitigate the aforementioned problem. Please @xiaoxiang781216 @arnopo help review this modification，thanks
Your last version looks more like a hack to me. If the release_cb does not solve the issue, we probably have to consider another approach

hi, @arnopo , I can solve the problem mentioned above at the application layer, so using release_cb and refnt may be a better choice. I has updated this pr. thanks

yintao707 commented 10 months ago

2 remaining comments not yet addressed + 2 new one

Done, thanks！

GUIDINGLI commented 10 months ago

LGTM