[HuggingFace] `torch_neuronx.trace` exports `torch.int64` tensors as `NRT_DTYPE_INT32` with `dim * 2`

OlivierDehaene commented 11 months ago

Hello

There is a bug in torch_neuronx.trace: torch.int64 tensors are exported as NRT_DTYPE_INT32 with dim * 2 in the NEFF file (for example, a [2x512] torch.int64 tensor would be exported as a [2x1024] NRT_DTYPE_INT32 tensor).

Here is an example:

test.py :

import torch
import torch_neuronx

def fn(a):
    return a * 2

examples_inputs = torch.randint(0, 32, (1, 32), dtype=torch.int64)

trace = torch_neuronx.trace(fn, examples_inputs, compiler_workdir="./")

test.c

#include <stdbool.h>
#include <nrt/nrt.h>
#include <nrt/nrt_experimental.h>

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

#define P_ERR(...) fprintf(stderr, __VA_ARGS__)

#define CHECK_RESULT(res, expected, ...)    \
    if (res != expected) {                  \
        fprintf(stderr, __VA_ARGS__);       \
        exit(-1);                           \
    }

// Function to mmap a file in the application's memory space,
// it will return a pointer to the mmapped memory and the size
// of the mmapped data will be written to *size
void *mmap_file(const char *filepath, size_t *size) {
    struct stat sb;
    int fd = open(filepath, O_RDONLY);
    if (fd < 0 || fstat(fd, &sb) != 0) {
        fprintf(stderr, "Unable to open %s: %s\n", filepath, strerror(errno));
        return MAP_FAILED;
    }
    *size = sb.st_size;
    return mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
}

void iterate_tensors(nrt_tensor_info_array_t *info_array) {
    int tensor_idx;
    nrt_tensor_info_t *tensor_info = NULL;

    for (tensor_idx = 0; tensor_idx < info_array->tensor_count; tensor_idx++) {
        tensor_info = &info_array->tensor_array[tensor_idx];
        printf("NRT DType %u\n", tensor_info->dtype);

        int shape_idx;
        uint32_t s;
        printf("Shape: ");
        for (shape_idx = 0; shape_idx < tensor_info->ndim; shape_idx++) {
            s = tensor_info->shape[shape_idx];
            printf("%u\t", s);
        }
        printf("\n");
    }
}

int main(int argc, char *argv[]) {
    NRT_STATUS result;
    void *neff_data = NULL;
    size_t neff_size = 0;

    nrt_model_t *model = NULL;
    nrt_tensor_info_array_t *tensor_info_array = NULL;

    // Try mmapping the NEFF file first, so we can fail fast if not found or
    // mmap fails
    neff_data = mmap_file(argv[1], &neff_size);
    if (neff_data == MAP_FAILED) {
        fprintf(stderr, "Unable to map file %s\n", argv[1]);
        exit(-1);
    }

    // Before calling any nrt API, nrt_init must be called
    // Since this is not running as part of a framework, the correct parameter for 'framework' is
    // NRT_FRAMEWORK_TYPE_NO_FW and the others can be empty strings
    result = nrt_init(NRT_FRAMEWORK_TYPE_NO_FW, "", "");
    CHECK_RESULT(result, NRT_SUCCESS, "NRTLIB could not be initialized, error: %d\n", (int)result);

    // Loading the NEFF
    printf("Loading NEFF\n");
    result = nrt_load(neff_data, neff_size, -1, -1, &model);
    CHECK_RESULT(result, NRT_SUCCESS, "Unable to load NEFF\n");

    // In order to allocate tensors, first we need to call nrt_get_model_tensor_info which
    // will give us the model tensors' names and sizes in tensor_info_array
    printf("Getting IO tensor information\n");
    result = nrt_get_model_tensor_info(model, &tensor_info_array);
    CHECK_RESULT(result, NRT_SUCCESS, "Unable to get model tensor information\n");

    iterate_tensors(tensor_info_array);

    // Unloading the model
    result = nrt_unload(model);
    if (result != NRT_SUCCESS) {
        P_ERR("Unable to unload NEFF\n");
    }

    printf("Deallocating model tensor info\n");
    // We are done with the tensor_info_array, we can dispose of it
    nrt_free_model_tensor_info(tensor_info_array);

    // Clean-up the runtime
    printf("Cleaning up the runtime\n");
    nrt_close();

    printf("DONE\n");
}

python test.py

gcc test.c -o test_neff -lnrt  -I/opt/aws/neuron/include -L/opt/aws/neuron/lib
./test_neff graph.neff

# Loading NEFF
# Getting IO tensor information
# NRT DType 8
# Shape: 1  64
# NRT DType 8
# Shape: 1  64
# Deallocating model tensor info
# Cleaning up the runtime
# DONE

DType should be 10 and Shape should be 1x32.

aws-rhsoln commented 11 months ago

We do typecast int64 tensors to int32. This happens in the compiler where it treats int64 datatypes as int32 by truncating the high-order bits (https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/compiler/neuronx-cc/index.html?highlight=int64#known-issues). This is because the hardware doesn't support int64 data-types natively.

OlivierDehaene commented 11 months ago

I see. Maybe a warning during compilation could be added?

This still feels like a bug as a user of the SDK.

If the high-orders bits truncation happens in the graph (which seems to be the case because the size of the tensor is not modified) then the input dtype is still int64 and the shape should remain [1x32].
If the high-orders bits truncation must happen beforehand by the user, the input dtype is int32 and the shape should remain [1x32].

Having dtype int32 + [1x64] is error prone and does not provide the info back to the neff user than the tensor is not a "true" int32 but a int64 camouflaged as int32.

The compiler saying that it casted to int32 while not really doing so (because the size is the same) is very weird.

aws-neuron / aws-neuron-sdk

[HuggingFace] `torch_neuronx.trace` exports `torch.int64` tensors as `NRT_DTYPE_INT32` with `dim * 2` #785