IronLanguages / ironpython3

Implementation of Python 3.x for .NET Framework that is built on top of the Dynamic Language Runtime.
Apache License 2.0
2.51k stars 290 forks source link

Get deadlocked occasionally... #1596

Closed scott-xu closed 2 years ago

scott-xu commented 2 years ago

Description

A worker thread is calling PythonContext.CreateBuiltinModule which locks PythonContext. Then it will call FunctionCode.LazyCompileFirstTarget where it tries to acquire _CodeCreateAndUpdateDelegateLock

PythonContext.cs

internal PythonModule GetBuiltinModule(string name) {
    lock (this) {
        PythonModule mod = CreateBuiltinModule(name);
        if (mod != null) {
            PublishModule(name, mod);
            return mod;
        }

        return null;
    }
}

FunctionCode.cs

internal void LazyCompileFirstTarget(PythonFunction function) {
    lock (_CodeCreateAndUpdateDelegateLock) {
        UpdateDelegate(function.Context.LanguageContext, true);
    }
}

A GC thread is executing the WeakRef's finalizer which calls FunctionCode.LazyCompileFirstTarget and locks _CodeCreateAndUpdateDelegateLock, then it will call PythonContext.DeleteIndex where it tries to acquire PythonContext.

PythonContext.cs

internal PythonDeleteIndexBinder/*!*/ DeleteIndex(int argCount) {
    if (_deleteIndexBinders == null) {
        Interlocked.CompareExchange(ref _deleteIndexBinders, new PythonDeleteIndexBinder[argCount + 1], null);
    }

    lock (this) {
        if (_deleteIndexBinders.Length <= argCount) {
            Array.Resize(ref _deleteIndexBinders, argCount + 1);
        }

        if (_deleteIndexBinders[argCount] == null) {
            _deleteIndexBinders[argCount] = new PythonDeleteIndexBinder(this, argCount);
        }

        return _deleteIndexBinders[argCount];
    }
}

Worker Thread

image . . . image

GC Thread

image

slozier commented 2 years ago

Thanks for the report. I don't see a reason why we couldn't use a different lock in PythonContext.DeleteIndex (as well as in GetIndex and SetIndex). Do you have a code sample code that could be used to reproduce the issue so I can set up a unit test?

scott-xu commented 2 years ago

Here's the sample code.

import xml.etree.ElementTree as ET
s = '<Data>' + ('1'*8192) + '</Data>'
elem = ET.fromstring(s)
result_len = len(ET.tostring(elem))
string_len = len(s)
result = str(result_len == string_len)
slozier commented 2 years ago

Thanks, was able to reproduce the deadlock with code like this:

var code = @"
import xml.etree.ElementTree as ET
s = '<Data>' + ('1'*8192) + '</Data>'
elem = ET.fromstring(s)
ET.tostring(elem)";

// retry to increase chance of deadlock
for (var i = 0; i < 100; i++) {
    Console.WriteLine(i);
    ScriptEngine eng = Python.CreateEngine();
    eng.SetSearchPaths(new List<string> { pathToLib });
    await Task.Run(() => eng.Execute(code));
}
scott-xu commented 2 years ago

Thanks for the quick fix! Will the fix be included in 3.4 final release? It has been half a year since 3.4 beta1. When do you think 3.4 final would be ready?

slozier commented 2 years ago

Thanks for the quick fix! Will the fix be included in 3.4 final release? It has been half a year since 3.4 beta1. When do you think 3.4 final would be ready?

Yes it will be in the final release which should be ready soon (yes, I know I've been saying that for a while). I just need to find some time to finish up https://github.com/IronLanguages/ironpython3/pull/1598, package DLR and IronPython and then test the packages.

scott-xu commented 2 years ago

Perfect! Thanks for the great work!