dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

[mono] Deadlock on mono_loader_lock in mono_class_create_from_typedef #93686

Open uweigand opened 11 months ago

uweigand commented 11 months ago

We've seen a deadlock in msbuild when performing a source-build of rc2 natively on s390x (using the Mono runtime). In a nutshell, the deadlock occurs due to a lock-order inversion between the Mono JIT's mono_loader_lock and the managed Microsoft.Build.BackEnd.Logging.LoggingService._lockObject.

Specifically, we have two involved threads, the main thread and a thread pool thread.

The main thread runs

The thread pool thread runs

It seems to me the root cause of the deadlock is that mono_class_create_from_typedef holds mono_loader_lock across function calls that may trigger invoking managed code (the assembly load hooks) which might do anything - this doesn't seem a good idea.

CC - @directhex @lambdageek @vargaz @akoeplinger FYI - @giritrivedi @alhad-deshpande @janani66 @omajid @tmds

lambdageek commented 11 months ago

It seems to me the root cause of the deadlock is that mono_class_create_from_typedef holds mono_loader_lock across function calls that may trigger invoking managed code (the assembly load hooks) which might do anything - this doesn't seem a good idea.

Good catch!

lambdageek commented 11 months ago

Related to / duplicate of https://github.com/dotnet/runtime/issues/51864

This is hard

lambdageek commented 11 months ago

@uweigand this will be quite tricky to fix and I think even if we put something together, it is highly unlikely that we will backport the fix in upstream net8.0 as it's bound to be risky (any time we make big changes to class-init.c we uncover issues with the change).

I would encourage, to whatever extent possible, to fix this in user code by rearranging the calling code to trigger assembly loading early. (I realize in this case I'm proposing that you change MSBuild... sorry.)

Vishwanatha-HD commented 7 months ago

I did reproduce the hang issue on dotnet 8 on different Redhat beaker machines with different RHEL distros.

I was able to reproduce for around 10 or 12 hang situations and in each of the case, when I attached gdb and looked at the backtrace information, I see the same backtrace as attached in the file..

I did work with Ulrich and local team in IBM, and Ulrich helped us to figure out that its the exact same issue as this one.

I have attached the logfile with all the details of all the process and threads backtrace.. dotnet_8_hang_issue_gdb_all_proc_info.txt

uweigand commented 6 months ago

@Vishwanatha-HD has now found an instance of this deadlock in the .NET 6 source-build. Unfortunately, while one of the involved locks is also the mono_loader_lock, the other involved lock is a different one; not a Microsoft.Build.BackEnd.Logging.LoggingService._lockObject, but rather an _lock object inside the implementation of ArrayPool<byte>.Shared in the runtime.

Again, we have a deadlock between two of msbuild's threads, this time a dedicated event logging thread and some thread pool thread.

The event logging thread

The thread pool thread

Vishwanatha-HD commented 5 months ago

Hi All, I have attached a txt file with the complete backtrace information of all the processes and threads involved in reproducing the .NET6 source build hang issue. dotnet-6-source-build-hang-process-thread-bt-info.txt

lambdageek commented 5 months ago

@uweigand overall how frequent are these deadlocks?

I tried to prototype one idea for a fix (essentially avoid loader callbacks into managed code while holding the global loader lock by pre-loading any assemblies mentioned by a a MonoClass before taking the lock), and it was very invasive, was likely to result in overall performance degradation and also never quite converged on a correct solution (I could still see the loader lock held when loading assemblies - and I ran out of time on the experiment to fix them all).

I'm going to think about some alternate approaches during .NET 9, but I'm not certain there is any reasonable approach here.


mono_jit_compile_method specifically might be somewhat fixable:

        // FIXME Locking here is somewhat historical due to mono_register_jit_icall_wrapper taking loader lock.
        // atomic_compare_exchange should suffice.
        mono_loader_lock ();
        mono_jit_lock ();
        if (!callinfo->wrapper) {
            callinfo->wrapper = p;
        }
        mono_jit_unlock ();
        mono_loader_unlock ();

This will likely just shift around the specific circumstances of the deadlock, however. It wont' fix the fundamental problem.

uweigand commented 5 months ago

I've now done a bit of code-flow analysis to try and figure out the places where Mono invokes a C# routine while holding the loader lock. The following list contains functions that take the loader lock, and lists all calls to subroutines while holding the lock that potentially invoke a C# routine on some path. This may not be 100% accurate (there may be false positives as in many cases, the invocation of the C# routine is conditional, and it might happen that at this call site the condition is never true; and there may be false negatives in case I missed part of the CFG due to indirect calls or macro magic), but it does show that this problem might be more widespread that I originally thought.

I've done the same analysis on the code after applying the "ready levels" patch https://github.com/dotnet/runtime/pull/95927, but unfortunately this doesn't look significantly different - even when accounting for the fact that some of the recursive class loader calls under the lock no longer load the parent classes, most of these functions still have other code paths where C# is invoked.

mono/metadata/class-init.c:mono_class_setup_fields
--> mono_class_layout_fields
mono/metadata/class-init.c:mono_class_create_from_typedef[_at_level]
--> mono_metadata_load_generic_params  (LEVEL: no longer under loader lock, but now under image lock)
--> mono_metadata_lookup_generic_class  (LEVEL: no longer under loader lock, but now under image lock)
--> mono_class_get_checked
--> mono_class_inflate_generic_class_checked
--> mono_class_setup_mono_type
--> fix_gclass_incomplete_instantiation
--> mono_class_create_from_typedef
--> mono_metadata_interfaces_from_typedef_full
--> mono_class_find_enum_basetype
--> mono_class_from_mono_type_internal
--> class_has_isbyreflike_attribute
--> class_has_inlinearray_attribute
mono/metadata/class-init.c:mono_class_create_generic_inst[_at_level]
--> check_valid_generic_inst_arguments  (LEVEL: additionally under lock)
--> mono_type_get_full_name  (LEVEL: additionally under lock)
--> mono_class_get_nullable_param_internal
--> mono_generic_class_setup_parent
--> class_loaded
mono/metadata/class-init.c:mono_class_create_ptr_at_level:
--> class_loaded  (LEVEL: additionally under lock)
mono/metadata/class-init.c:mono_class_layout_fields
--> mono_class_full_name
mono/metadata/class-init.c:mono_class_setup_interface_id
--> mono_class_setup_interface_id_nolock --> mono_get_unique_iid

mono/metadata/class-setup-vtable.c:mono_class_setup_interface_offsets_internal
--> mono_class_setup_interface_id_nolock
--> mono_class_get_implemented_interfaces
--> mono_type_get_full_name
--> mono_class_setup_count_virtual_methods

mono/metadata/object.c:mono_vtable_build_imt_slot
--> build_imt_slot
mono/metadata/object.c:mono_method_add_generic_virtual_invocation
--> mini_get_imt_trampoline (via callbacks.get_imt_trampoline)
--> mini_get_vtable_trampoline (via callbacks.get_vtable_trampoline)
--> mono_arch_build_imt_trampoline (via mono_get_runtime_callbacks ()->build_imt_trampoline)
mono/metadata/object.c:allocate_collectible_static_fields
--> mono_class_get_fields_internal
--> allocate_loader_alloc_slot
--> mono_class_from_mono_type_internal
--> mono_object_new_pinned
mono/metadata/object.c:mono_class_create_runtime_vtable
--> mono_class_init_internal
--> mono_class_setup_vtable
--> mono_class_check_vtable_constraints
--> mono_class_has_finalizer
--> mono_class_data_size
--> alloc_vtable
--> mono_mem_manager_get_loader_alloc
--> mono_class_compute_gc_descriptor
--> allocate_collectible_static_fields
--> compute_class_bitmap
--> mono_class_get_fields_internal
--> field_is_special_static
--> mono_class_from_mono_type_internal
--> mono_type_size
--> mono_field_get_data
--> mono_class_value_size
--> mono_type_get_object_handle

mono/metadata/sre.c:reflection_setup_class_hierarchy
--> mono_class_from_mono_type_internal
--> mono_class_setup_mono_type
mono/metadata/sre.c:reflection_setup_internal_class_internal
--> reflection_init_generic_class
--> reflection_setup_internal_class
--> mono_reflection_type_handle_mono_type
--> mono_class_setup_vtable_general
--> mono_class_setup_mono_type
--> mono_class_from_mono_type_internal
mono/metadata/sre.c:reflection_methodbuilder_to_mono_method
--> mono_reflection_type_get_handle
--> mono_metadata_type_dup
--> mono_get_shared_generic_inst
--> mono_marshal_spec_from_builder
mono/metadata/sre.c:ctorbuilder_to_mono_method
--> mono_reflection_methodbuilder_from_ctor_builder
--> ctor_builder_to_signature_raw
mono/metadata/sre.c:methodbuilder_to_mono_method
--> method_builder_to_signature
mono/metadata/sre.c:ves_icall_TypeBuilder_create_runtime_class
--> mono_class_setup_mono_type
--> ensure_runtime_vtable
--> reflection_setup_internal_class
--> mono_class_from_mono_type_internal
--> typebuilder_setup_fields
--> remove_instantiations_of_and_ensure_contents
mono/metadata/sre.c:reflection_create_dynamic_method
--> mono_gc_reference_queue_add_internal

mono/metadata/reflection.c:mono_type_get_object_checked /*FIXME mono_class_init_internal and mono_class_vtable acquire it*/
--> mono_type_normalize
--> mono_type_get_full_name
--> mono_weak_hash_table_insert
--> mono_mem_manager_get_loader_alloc
--> mono_object_new_pinned
--> mono_type_get_object_checked (!!! recursive call under lock !!!)
--> mono_mem_manager_init_reflection_hashes  (recursive!)
--> mono_class_from_mono_type_internal  (recursive!)
mono/metadata/reflection.c:mono_reflection_bind_generic_parameters
--> mono_reflection_type_handle_mono_type
--> mono_class_from_mono_type_internal

mono/metadata/metadata.c:mono_metadata_get_canonical_generic_inst // Hashtable key equal func can take loader lock
--> mono_metadata_generic_inst_equal
--> mono_metadata_type_dup
mono/metadata/metadata.c:mono_metadata_get_canonical_aggregate_modifiers // Hashtable key equal func can take loader lock
--> aggregate_modifiers_equal
--> mono_metadata_type_dup
mono/metadata/metadata.c:mono_metadata_lookup_generic_class // Hashtable key equal func can take loader lock
--> mono_generic_class_equal (via mono_conc_hashtable_insert)

mono/mini/mini.c:mono_jit_compile_method_inner
--> mono_method_is_generic_sharable
--> mono_method_desc_full_match
--> mono_method_get_full_name
--> mono_emit_jit_map

mono/mini/mini-generic-sharing.c:mono_class_get_runtime_generic_context_template
--> mono_class_get_runtime_generic_context_template  (!!! recursive call under lock !!!)
--> class_get_rgctx_template_oti

mono/mini/mini-llvm.c:mono_llvm_emit_method
--> mono_aot_get_mangled_method_name
--> mono_aot_get_method_name
--> mono_method_full_name
--> emit_method_inner
--> get_callee

mono/mini/aot-runtime.c:init_amodule_got
--> decode_patches
--> mini_llvm_init_method
--> mono_resolve_patch_target
mono/mini/aot-runtime.c:load_container_amodule
--> load_aot_module

mono/component/hot_reload.c:hot_reload_get_or_add_ginst_update_info
--> recompute_ginst_update_info

mono/component/debugger-engine.c:mono_de_add_pending_breakpoints
--> mono_jit_search_all_backends_for_jit_info
--> insert_breakpoint
mono/component/debugger-engine.c:mono_de_set_breakpoint
--> set_bp_in_method
--> mono_jit_search_all_backends_for_jit_info
--> mono_debugger_log_add_bp
mono/component/debugger-engine.c:mono_de_clear_breakpoint
--> mono_debugger_log_remove_bp
mono/component/debugger-engine.c:mono_de_clear_breakpoints_for_domain
--> remove_breakpoint
mono/component/debugger-engine.c:mono_de_process_breakpoint
--> mono_find_prev_seq_point_for_native_offset
--> no_seq_points_found
--> mono_debugger_log_bp_hit
--> mono_de_frame_async_id
--> mono_de_ss_update
--> mono_de_ss_start
mono/component/debugger-engine.c:mono_de_ss_start
--> mono_find_prev_seq_point_for_native_offset
--> mono_de_start_single_stepping

mono/component/debugger-agent.c:suspend_vm
--> mono_de_start_single_stepping
--> notify_thread
mono/component/debugger-agent.c:process_frame
--> calc_il_offset
--> mono_method_full_name
--> mono_marshal_method_from_wrapper
mono/component/debugger-agent.c:send_types_for_domain
--> emit_type_load
mono/component/debugger-agent.c:event_requests_cleanup
--> mono_de_clear_breakpoint
mono/component/debugger-agent.c:clear_event_request
--> mono_de_clear_breakpoint
--> mono_de_cancel_ss
mono/component/debugger-agent.c:clear_event_requests_for_assembly
--> clear_event_request
--> ss_clear_for_assembly
mono/component/debugger-agent.c:clear_types_for_assembly
--> type_comes_from_assembly
mono/component/debugger-agent.c:dispose_vm
--> clear_event_request
mono/component/debugger-agent.c:vm_commands
--> clear_event_request
--> get_types_for_source_file
--> get_types
mono/component/debugger-agent.c:event_commands
--> emit_appdomain_load
--> send_assemblies_for_domain
--> emit_thread_start
--> send_types_for_domain
--> clear_event_request
--> mono_de_clear_breakpoint

As an aside, it seems unclear what exactly the loader lock is protecting - many of these uses of the lock should probably use some other serialization mechanism (if any is needed at all) ...