emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.85k stars 3.31k forks source link

Memory Access Out Of Bounds only without sanitizing #19268

Open DoLevi opened 1 year ago

DoLevi commented 1 year ago

Version of emscripten/emsdk: 3.1.32

Failing command line in full: No build-/link-time failure

Full link command and output with -v appended: Build & link without sanitize (log here)

Expand build command ```bash emcc -O3 -g -v \ -I /libde265 -I /libde265/libde265 \ -s MODULARIZE=1 \ -s EXPORT_ES6=1 \ -s EXPORT_NAME="libde265Coder" \ -o "/dist/$BUILD_NAME.out.js" \ -s WASM=1 \ -s EXPORTED_FUNCTIONS=_malloc,_free \ -s EXPORTED_RUNTIME_METHODS=addFunction,removeFunction \ -s ALLOW_TABLE_GROWTH=1 \ -s INITIAL_MEMORY=134217728 \ -s USE_PTHREADS=1 \ -s SHARED_MEMORY=1 \ -s ASSERTIONS=1 \ -pthread \ --cache "/cache" \ /src/libde265_coder.cc /libde265/libde265/*.cc /libde265/libde265/encoder/*.cc /libde265/libde265/encoder/algo/*.cc ```

Build & link with sanitize (log here)

Expand build command ```bash emcc -O3 -g -v -fsanitize=address \ -I /libde265 -I /libde265/libde265 \ -s MODULARIZE=1 \ -s EXPORT_ES6=1 \ -s EXPORT_NAME="libde265Coder" \ -o "/dist/$BUILD_NAME.out.js" \ -s WASM=1 \ -s EXPORTED_FUNCTIONS=_malloc,_free \ -s EXPORTED_RUNTIME_METHODS=addFunction,removeFunction \ -s ALLOW_TABLE_GROWTH=1 \ -s INITIAL_MEMORY=134217728 \ -s USE_PTHREADS=1 \ -s SHARED_MEMORY=1 \ -s ASSERTIONS=1 \ -pthread \ --cache "/cache" \ /src/libde265_coder.cc /libde265/libde265/*.cc /libde265/libde265/encoder/*.cc /libde265/libde265/encoder/algo/*.cc ```

The library I try to port to WebAssembly hits a first call to new slice_segment_header (Class 2) which succeeds and later a second call to it which causes this Error to be thrown:

RuntimeError: memory access out of bounds
    at tmalloc_large (dlmalloc.c:4534)
    at dlmalloc (dlmalloc.c:4720)
    at operator new(unsigned operator new(unsigned long) (new.cpp:67)
    at // ...

However when I run with -fsanitize=address the error does not appear and the code runs through every call to new slice_segment_header successfully.

Also, i tried changing the INITIAL_MEMORY (to 16MB, 64MB, 128MB) as well as ALLOW_MEMORY_GROWTH=1 but the error persists for any of those settings.

Class 1, referenced by class 2 ```C class ref_pic_set { public: // Lists of pictures that have to be kept in the decoded picture buffer for future // reference and that may optionally be used for prediction in the current frame. // Lists contain the relative POC positions. int16_t DeltaPocS0[MAX_NUM_REF_PICS]; // sorted in decreasing order (e.g. -1, -2, -4, -7, ...) int16_t DeltaPocS1[MAX_NUM_REF_PICS]; // sorted in ascending order (e.g. 1, 2, 4, 7) // flag for each reference whether this is actually used for prediction in the current frame uint8_t UsedByCurrPicS0[MAX_NUM_REF_PICS]; uint8_t UsedByCurrPicS1[MAX_NUM_REF_PICS]; uint8_t NumNegativePics; // number of past reference pictures uint8_t NumPositivePics; // number of future reference pictures // --- derived values --- void compute_derived_values(); uint8_t NumDeltaPocs; // total number of reference pictures (past + future) uint8_t NumPocTotalCurr_shortterm_only; /* Total number of reference pictures that may actually be used for prediction in the current frame. */ void reset(); }; ```
Class 2, referencing class 1 ```C class slice_segment_header { public: slice_segment_header() { reset(); } de265_error read(bitreader* br, decoder_context*, bool* continueDecoding); de265_error write(error_queue*, CABAC_encoder&, const seq_parameter_set* sps, const pic_parameter_set* pps, uint8_t nal_unit_type); void dump_slice_segment_header(const decoder_context*, int fd) const; void set_defaults(); void reset(); int slice_index; // index through all slices in a picture (internal only) std::shared_ptr pps; char first_slice_segment_in_pic_flag; char no_output_of_prior_pics_flag; int slice_pic_parameter_set_id; char dependent_slice_segment_flag; int slice_segment_address; int slice_type; char pic_output_flag; char colour_plane_id; int slice_pic_order_cnt_lsb; char short_term_ref_pic_set_sps_flag; ref_pic_set slice_ref_pic_set; int short_term_ref_pic_set_idx; int num_long_term_sps; int num_long_term_pics; uint8_t lt_idx_sps[MAX_NUM_REF_PICS]; int poc_lsb_lt[MAX_NUM_REF_PICS]; char used_by_curr_pic_lt_flag[MAX_NUM_REF_PICS]; char delta_poc_msb_present_flag[MAX_NUM_REF_PICS]; int delta_poc_msb_cycle_lt[MAX_NUM_REF_PICS]; char slice_temporal_mvp_enabled_flag; char slice_sao_luma_flag; char slice_sao_chroma_flag; char num_ref_idx_active_override_flag; int num_ref_idx_l0_active; // [1;16] int num_ref_idx_l1_active; // [1;16] char ref_pic_list_modification_flag_l0; char ref_pic_list_modification_flag_l1; uint8_t list_entry_l0[16]; uint8_t list_entry_l1[16]; char mvd_l1_zero_flag; char cabac_init_flag; char collocated_from_l0_flag; int collocated_ref_idx; // --- pred_weight_table --- uint8_t luma_log2_weight_denom; // [0;7] uint8_t ChromaLog2WeightDenom; // [0;7] // first index is L0/L1 uint8_t luma_weight_flag[2][16]; // bool uint8_t chroma_weight_flag[2][16]; // bool int16_t LumaWeight[2][16]; int8_t luma_offset[2][16]; int16_t ChromaWeight[2][16][2]; int8_t ChromaOffset[2][16][2]; int five_minus_max_num_merge_cand; int slice_qp_delta; int slice_cb_qp_offset; int slice_cr_qp_offset; char cu_chroma_qp_offset_enabled_flag; char deblocking_filter_override_flag; char slice_deblocking_filter_disabled_flag; int slice_beta_offset; // = pps->beta_offset if undefined int slice_tc_offset; // = pps->tc_offset if undefined char slice_loop_filter_across_slices_enabled_flag; int num_entry_point_offsets; int offset_len; std::vector entry_point_offset; int slice_segment_header_extension_length; // --- derived data --- int SliceQPY; int initType; void compute_derived_values(const pic_parameter_set* pps); // --- data for external modules --- int SliceAddrRS; // slice_segment_address of last independent slice int MaxNumMergeCand; // directly derived from 'five_minus_max_num_merge_cand' int CurrRpsIdx; ref_pic_set CurrRps; // the active reference-picture set int NumPocTotalCurr; // number of entries: num_ref_idx_l0_active / num_ref_idx_l1_active int RefPicList[2][MAX_NUM_REF_PICS]; // contains buffer IDs (D:indices into DPB/E:frame number) int RefPicList_POC[2][MAX_NUM_REF_PICS]; int RefPicList_PicState[2][MAX_NUM_REF_PICS]; /* We have to save the PicState because the decoding of an image may be delayed and the PicState can change in the mean-time (e.g. from ShortTerm to LongTerm). PicState is used in motion.cc */ char LongTermRefPic[2][MAX_NUM_REF_PICS]; /* Flag whether the picture at this ref-pic-list is a long-term picture. */ // context storage for dependent slices (stores CABAC model at end of slice segment) context_model_table ctx_model_storage; bool ctx_model_storage_defined; // whether there is valid data in ctx_model_storage std::vector RemoveReferencesList; // images that can be removed from the DPB before decoding this slice }; ```
sbc100 commented 1 year ago

Seems like it must be some kind of memory corruption of the malloc metadata.

Can you trying building without USE_PTHREADS and SHARED_MEMORY?

Does it happen at all opt levels (-O0, -O1, etc..)?

DoLevi commented 1 year ago

The program execution fails on the following n-th calls on my machine:

build flags -O0 -O1 -O2 -O3
with USE_PTHREADS & SHARED_MEMORY 2nd call 2nd call 2nd call 2nd call
without USE_PTHREADS & SHARED_MEMORY 2nd call 3rd call 3rd call 3rd call
DoLevi commented 1 year ago

And if I build with pthreads and the sanitizer I run into:

RuntimeError: memory access out of bounds
    at __pthread_getspecific
    at __asan::AsanTSDGet()
    at __asan::GetCurrentThread()
    at __sanitizer::BufferedStackTrace::UnwindImpl(unsigned long, unsigned long, void*, bool, unsigned int) 
    at malloc
    at operator new(unsigned long)
    at operator new[](unsigned long)
    at // ...

seemingly appearing if the WebAssembly application demands more than a certain amount of memory (in spite of ALLOW_MEMORY_GROWTH=1).

This occurs for the build command:

emcc -O3 -g -fsanitize=undefined -fsanitize=address -s ASSERTIONS=1 \
  -I /libde265 -I /libde265/libde265 \
  -s MODULARIZE=1 \
  -s EXPORT_ES6=1 \
  -s EXPORT_NAME="libde265Coder" \
  -o "/dist/$BUILD_NAME.out.js" \
  -s WASM=1 \
  -s EXPORTED_FUNCTIONS=_malloc,_free \
  -s EXPORTED_RUNTIME_METHODS=addFunction,removeFunction,GROWABLE_HEAP_U8 \
  -s ALLOW_TABLE_GROWTH=1 \
  -s ALLOW_MEMORY_GROWTH=1 \
  -s USE_PTHREADS=1 \
  -s SHARED_MEMORY=1 \
  -s PTHREAD_POOL_SIZE=navigator.hardwareConcurrency \
  -pthread \
  --cache "/cache" \
  /src/libde265_coder.cc /libde265/libde265/*.cc /libde265/libde265/encoder/*.cc /libde265/libde265/encoder/algo/*.cc
DoLevi commented 1 year ago

A core developer of the library I try to port helped me out so the issue seems solved to me.

It seems there was an issue with said library which they plan to fix because potentially a lot of data is pushed to the stack.

By increasing the stack size (-sSTACK_SIZE=10MB) the program runs well and as expected in all regards.

I am still wondering though why this is issue is not appearing when using sanitizing (as I'm not an experienced C-developer at all :sweat_smile:).

kripken commented 1 year ago

It's possible that memory layout is a little different when sanitizing, and somehow the problem is avoided by pure luck. But a stack overflow is something I'd expect sanitizers (at least asan) to find, so it is odd...

sbc100 commented 1 year ago

Stack overload should also be detected by -sASSERTIONS=2 and to a lesser extent -sASSERTIONS=2