apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.6k stars 3.44k forks source link

[RFC] MISRA-C/C++ Compliant Runtime #3159

Closed liangfu closed 4 years ago

liangfu commented 5 years ago

For a long time, the development of TVM has been focused on optimizing deep learning inference for almost any processing systems, and it has been used widely in many applications fields. However, when users of TVM were trying to use theirs pretrained & quanitzed model in critical systems (e.g. aircraft control and autonomous driving), the running source code is expected to be extremely reliable and compact. This RFC intends to bring and discuss the possibility to port an essential part of the runtime code base to MISRA-C compliant form, so that the pretrained & quantized model can be potentially used in safety critical systems.

Features Proposed to be MISRA-C/C++ Compliant

apps/bundle_deploy/bundle.cc have concluded the essential source code files:

#include "src/runtime/c_runtime_api.cc"      
#include "src/runtime/cpu_device_api.cc"     
#include "src/runtime/workspace_pool.cc"     
#include "src/runtime/module_util.cc"        
#include "src/runtime/module.cc"             
#include "src/runtime/registry.cc"           
#include "src/runtime/file_util.cc"          
#include "src/runtime/threading_backend.cc"  
#include "src/runtime/thread_pool.cc"        
#include "src/runtime/ndarray.cc"            
#include "src/runtime/system_lib_module.cc"  
#include "src/runtime/graph/graph_runtime.cc"

Proposed Changes

Notable Coding Rules in MISRA-C

Control flow

14.1 (req) There shall be no unreachable code.
14.5 (req) The continue statement shall not be used.
14.7 (req) A function shall have a single point of exit at the end of the function.
14.10 (req) All if ... else if constructs shall be terminated with an else clause.

Pointer type conversions

11.1 (req) Conversions shall not be performed between a pointer to a function and any type
other than an integral type.
11.2 (req) Conversions shall not be performed between a pointer to object and any type
other than an integral type, another pointer to object type or a pointer to void.

Functions

16.1 (req) Functions shall not be defined with a variable number of arguments.

Standard libraries

20.4 (req) Dynamic heap memory allocation shall not be used.

Integration Tests

Discussion Topics

Reference

[1] MISRA-C & MISRA C++ [2] Guidelines for the use of the C++14 language in critical and safety-related systems [3] MISRA-C related issues in Zephyr RTOS [4] An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software

ajtulloch commented 5 years ago

@liangfu this looks quite interesting. Does MISRA compliance affect the compiler or the generated code as well (i.e. how do we certify that the output of e.g. LLVM is correct, doesn't segfault, etc?).

WRT to the minimal runtime I put together that bundle_deploy.cc list quite quickly, I think with a bit of pruning (e.g. if we enforce single threaded use from MISRA, we can get rid of threading_backend/thread_pool, etc). IIRC @nhynes and co were potentially interested in a minimal Rust runtime which might be useful as well here?

nhynes commented 5 years ago

minimal Rust runtime which might be useful as well here?

For sure. The Rust runtime has all of the features required for running syslib modules, but it makes use of alloc and some tasteful unsafety, which are disallowed by MISRA. If we really wanted to have an ultra-safe runtime, we could do a bit better there. Rust is probably closer to complete safety than rewriting the C++ runtime, though.

Of course, as @Ravenwater has oft mentioned, adopting Rust for existing projects is something of a nonstarter. Perhaps if we could enumerate which platforms (and users) would want a Safe runtime, we could determine whether C++ is truly necessary or just the first thing that comes to mind.

liangfu commented 5 years ago

Does MISRA compliance affect the compiler or the generated code as well?

In my observation, the compliance does affect the compiler and the code generator. However, as a first step towards making TVM available for the critical and safety related systems, I think it's important to consider the runtime design first.

Perhaps if we could enumerate which platforms (and users) would want a Safe runtime

It's difficult to enumerate all the platforms and users who want such runtime. However, it would be a good start to consider how could a developer use TVM to perform the perception task in a fully automated driving vehicle. There are a number of platforms that are already certified by ISO 26262 (e.g. QNX ), and for the open source world, I think it's good to start with Zephyr RTOS.

On the other hand, in order to reduce the effort in rewriting another minimal runtime, I think we should try to eliminate the usage of malloc, or at least conclude required memory allocation to input, intermediate and output buffers only.

tqchen commented 5 years ago

Memory allocation is fine, as long as we have a pre-allocated heap that provide a sufficient amount of meory

liangfu commented 5 years ago

A software implemented with malloc would only reach functional safety level of QM. It would be far from to be graded as ASIL-D. (See ISO 26262 Table 8 Topic 2 in Meeting ISO 26262 Guidelines with the Synopsys Software Integrity Portfolio )

Since neural networks are compiled and optimized in TVM, it is also possible to determine the size of variables in the runtime. This way, the generated config header can be used to eliminate the usage of malloc in the runtime implement.

nhynes commented 5 years ago

It's definitely possible to fixup the TVM runtime so that the buffers are included as part of the static library and to inline the pointers at compile time. If the graph is also constructed at compile time (something I've been wanting to do in the rust runtime anyway) then there's not much outside of (Workspace|Thread)Pool that needs allocation.

tqchen commented 5 years ago

Yes, what I mean is to pre-calculate the space necessary and mimic device-alloc with the fix size heap

nhynes commented 5 years ago

mimic device-alloc with the fix size heap

From what I can tell, writing a user malloc is against the spirit of MISRA. Part of the benefit is knowing exactly what will happen in the memory manager, but I'm not sure whether it's reasonable to ask people to bring their own libc. We actually don't need dynamic memory management at all, though.

tqchen commented 5 years ago

I don't mean to bring our own libc, but just to implement the DeviceAPI::Alloc with a fixed pre-allocated heap.

nhynes commented 5 years ago

Ah, that makes total sense: a non-trivial implementation of DeviceAPI. Gee, what a nice abstraction! I still kind of want the static graph, though :)

liangfu commented 5 years ago

If I understand correctly, using a fixed pre-allocated heap would lead us to convert a fragment of the storage pool (which is a fixed-sized static memory) into the specific data type for computation. This violates another rule that restrict (or avoid) using data type conversion. In this case, I think building static graph is preferred.

JammyZhou commented 4 years ago

@liangfu Good to know you are using Zephyr RTOS for the development of this RFC. Can you share more details about your effort to integrate TVM with Zephyr? I'm also interested in it.

liangfu commented 4 years ago

@JammyZhou The runtime part is actually OS independent. I refer to Zephyr RTOS as an example of OS that provides MISRA-C compliance. As Zephyr RTOS is designed to be architecture independent, I worked on building the runtime with Zephyr RTOS on top of qemu_x86, so that the complete system would be considered to be MISRA-C compliant, and it can be easily ported to other platforms with high reliability.

JammyZhou commented 4 years ago

@liangfu Thanks for your reply. I would assume some implementation is required in the runtime to call APIs provided by Zephyr (e.g, memory allocation, etc). Is there some base I can use as a start? I have several Arm boards with Zephyr support on hand, so I can try the TVM support on these real hardware.

tqchen commented 4 years ago

3934