Precise Definitions of DFI Argument Types

PengZheng commented 5 months ago

This should be addressed by https://github.com/apache/celix/commit/e34eb00b6ea2212504662dbc7bbff206bb33931e.

For am=handle, it is clear that a handle should be untyped pointer, with descriptor 'P'. For am=pre, the Doxygen documentation says that the type should be pre-allocated by the function caller. Let me call these types "pre-allocatable". The following plain old C struct should qualify as pre-allocatable:

struct preallocatable {
    int a;
    int b;
}

Is text pre-allocatable? I guess not, since the caller can not predict how long a string will be.

struct not_sure {
    int a;
    int b;
    struct preallocatable* c;
}

Is not_sure pre-allocatable? On the service provider side, the callee could allocate a struct preallocatable and fill the c field. Then not_sure can be serialized successfully. However, on the service consumer side, it can caused a use-after-free crash. In jsonRpc_handleReply, we deal with pre-allocatable argument like following:

if (meta == DYN_FUNCTION_ARGUMENT_META__PRE_ALLOCATED_OUTPUT) {
                void* tmp = NULL;
                void** out = (void **) args[i];
                size_t size = 0;

                argType = dynType_typedPointer_getTypedType(argType);
                status = jsonSerializer_deserializeJson(argType, result, &tmp);
                if (tmp != NULL) {
                    size = dynType_size(argType);
                    memcpy(*out, tmp, size); // address of freed struct preallocatable is copied
                }

                dynType_free(argType, tmp); // preallocatable is freed
            }

If we do have a precise definition of pre-allocatability, such errors can be catched in dynFunction_parse:

        if (strcmp(meta, "handle") == 0) {
            if (dynType_descriptorType(real) != 'P') {
                celix_err_pushf("Error 'handle' is only allowed for untyped pointer not '%c'", dynType_descriptorType(real));
                return PARSE_ERROR;
            }
            arg->argumentMeta = DYN_FUNCTION_ARGUMENT_META__HANDLE;
        } else if (strcmp(meta, "pre") == 0) {
            if (!dynType_isPreallocatable(real)) {
                celix_err_pushf("Error 'am=pre' is only allowed for preallocatable type");
                return PARSE_ERROR;
            }
            arg->argumentMeta = DYN_FUNCTION_ARGUMENT_META__PRE_ALLOCATED_OUTPUT;
        }

Similar considerations also apply to other types. For example, for a standard argument to work, it should at least be serializable. For exmaple, int is serializable, int * is serailizable, while int** is not, because we can not tell from a json null whether int* == NULL or int** == NULL.

Due to the lack of precise definitions of these argument types, I am stuck in json_rpc.c. Though it is already 100% line covered, it is still very fragile IMHO. @pnoltes

Originally posted by @PengZheng in https://github.com/apache/celix/issues/699#issuecomment-1905582200

PengZheng commented 5 months ago

Is text pre-allocatable? I guess not, since the caller can not predict how long a string will be.

The current implementation of jsonRpc_handleReply is confusing in this regard.

            if (meta == DYN_FUNCTION_ARGUMENT_META__PRE_ALLOCATED_OUTPUT) {
                void *tmp = NULL;
                void **out = (void **) args[i];
                size_t size = 0;

                if (dynType_descriptorType(argType) == 't') {
                    status = jsonSerializer_deserializeJson(argType, result, &tmp);
                    if (tmp != NULL) {
                        size = strnlen(((char *) *(char**) tmp), 1024 * 1024);
                        memcpy(*out, *(void**) tmp, size);
                    }
                } else {
                    dynType_typedPointer_getTypedType(argType, &argType);
                    status = jsonSerializer_deserializeJson(argType, result, &tmp);
                    if (tmp != NULL) {
                        size = dynType_size(argType);
                        memcpy(*out, tmp, size);
                    }
                }

How could the caller of the remote method to know the resulting strnlen so that he/she can allocate buffer large enough?

I feel that the precise definitions of these types are needed at the moment to have a solid RSA implementation. @pnoltes

PengZheng commented 5 months ago

I try to give the following definitions:

Standard argument should be serializable. All types except types involving untyped pointer or double pointer (pointer to pointer) are serializable. For example, complex types consisting of non-pointer fields are serializable while complex type containing a untyped pointer field is not serializable; [I is serializable while [P and [**D are not serializable.

am=out parameter should be pointer to text or double pointer to serializable types.

am=pre parameter should be pointer to trivially copyable type. It's safe to apply memcpy rivially copyable type without the usual danger of shallow copy. However, this definition is only meant to make the following code work:

            if (meta == DYN_FUNCTION_ARGUMENT_META__PRE_ALLOCATED_OUTPUT) {
                void *tmp = NULL;
                void **out = (void **) args[i];
                size_t size = 0;

            dynType_typedPointer_getTypedType(argType, &argType);
            status = jsonSerializer_deserializeJson(argType, result, &tmp);
            if (tmp != NULL) {
                size = dynType_size(argType);
                memcpy(*out, tmp, size);
            }
                dynType_free(argType, tmp);

By this definition, [D can not be used as am=pre, because dynType_free(argType, tmp) will free the embedded buffer, which has been copied to *out (thus lead to use-after-free). I think the main usage of SequenceType is to be used as pre-allocated output paramter and let the callee to fill in actual content. Thus this definition will make SequenceType useless.

Any ideas to make am=pre work? @pnoltes

The above reflects my understanding of the current code base, which may deviate from the original intention. Please shed light on this, any help will be appreciated.

PS: Maybe SequenceType does not qualify as am=pre, because it is essentially a inout parameter (currently unsupported), whose capacity must be serialized and sent to the provider end.

pnoltes commented 5 months ago

Sorry for the late reaction.

Any ideas to make am=pre work? @pnoltes

The above reflects my understanding of the current code base, which may deviate from the original intention. Please shed light on this, any help will be appreciated.

My original idea for the am=pre was to support primitive types (double, bool, ec) and indeed trivially trivially copyable type. I agree that this should be made more explicit and the parse of dfi can test if the am=pre argument fits the requirements. If needed this can be extended in the future.

So IMO no am=pre support for pointer in structs, double pointers, strings or sequences.

pnoltes commented 5 months ago

Is like the dynType_isPreallocatable approach :)

apache / celix

Precise Definitions of DFI Argument Types #723