Yikai-Liao / symusic

A cross platform note level midi decoding library with lightening speed, based on minimidi.
https://yikai-liao.github.io/symusic/
MIT License
108 stars 8 forks source link

Bug when multithreading with macOS since 0.3.4 #23

Closed Natooz closed 5 months ago

Natooz commented 5 months ago

Hi there πŸ‘‹,

I spotted a bug causing Python to crash (Fatal Python error: Segmentation fault) when running multithreading operations (pytest-xdist) with the latest version of symusic on macOS (intel).

You can find logs of the issue here https://github.com/Natooz/MidiTok/pull/142 I don't know what can be causing the issue exactly, I didn't check what changes have been made to the code. It seems to only happen on "What a Fool Believes.mid". I was able to reproduce locally, even though only one worker out of 8 crashed.

Yikai-Liao commented 5 months ago

Sorry, but I don't find the log there. It seems all the tests have been passed.

Natooz commented 5 months ago

Are the logs of the failed actions inaccessible ? https://github.com/Natooz/MidiTok/actions/runs/7700019008/job/20983922442 EDIT: my bad, I thought they stayed accessible. Here are the the raw logs

Otherwise you can reproduce it easily on an intel Mac:

# Make sure to have loaded a virtual python environment

git clone https://github.com/Natooz/MidiTok && cd MidiTok
pip install ".[tests]"
pytest -n logical -v tests/test_tokenize.py::test_multitrack_midi_to_tokens_to_midi
Yikai-Liao commented 5 months ago

Well, I did find something wrong with multiprocessing. The __setstate__ function was wrong. I have fixed it, but it is not related to this issue (Nothing changed after fixing it).

I didn't change the parsing midi code from 0.3.3 to 0.3.4. But checkout to v0.3.3 just works. And I have tried installing symusic from source, which would give the same error.

And this bug only appears on Mac. Quite a strange bug.

Natooz commented 5 months ago

I first though of an error with pytest itself or pytest-xdist as mentioned in other issues https://github.com/pytest-dev/pytest/issues/3216 but even with pytest v7.4.4 the issue occurs

Yikai-Liao commented 5 months ago

I will test the changes from 0.3.3 to 0.3.4 one by one. Fortunately, there are only a few changes

Yikai-Liao commented 5 months ago

@Natooz Could you offer me a minimal reproduce example without pytest? The error infomation in pytest is kind of ambiguous.

Natooz commented 5 months ago

Do you have an idea of how this could be achieved? The issue is that it only occurs with xdist, even with one worker.

Here are a few tests I just ran (no code modification):

pytest -n 8 -v tests/test_tokenize.py::test_multitrack_midi_to_tokens_to_midi

-n 1 --> 3 fails -n 2 --> 6 fails -n 8 --> 3 fail -n 20 --> 2 fails

I tested while the test method only loads the MIDI, no error occur, so this doesn't comes from here.

It fails (1 worker out of 8) when the test method is simply:

def _test_tokenize(
    midi_path: str | Path,
    tok_params_set: tuple[str, dict[str, Any]],
) -> None:
    r"""
    Tokenize a MIDI file, decode it back and make sure it is identical to the ogi.

    The decoded MIDI should be identical to the original one after downsampling, and
    potentially notes deduplication.

    :param midi_path: path to the MIDI file to test.
    :param tok_params_set: tokenizer and its parameters to run.
    """
    # Reads the MIDI and add pedal messages to make sure there are some
    try:
        midi = Score(Path(midi_path))
    except MIDI_LOADING_EXCEPTION as e:
        pytest.skip(f"Error when loading {midi_path.name}: {e}")

    # Creates the tokenizer
    tokenization, params = tok_params_set
    tokenizer: miditok.MIDITokenizer = getattr(miditok, tokenization)(
        tokenizer_config=miditok.TokenizerConfig(**params)
    )
    str(tokenizer)  # shouldn't fail
    tokenizer.preprocess_midi(midi)

So the issue occurs when preprocessing the MIDI, that is converting its attributes (notes, pitch bends...) to SOA and SOA back to attributes which are then assigned to the Scoreobject.

**EDIT: the issue occurs without preprocessing the MIDI, but the tokenizer has to be loaded for it to happen.

I'm investigating further

Natooz commented 5 months ago

With the test method above, the tests pass on my machine with certain numbers of workers: 2 workers the tests pass, 4 workers sometimes one fails sometimes all pass, 3 workers (3 fails), 5 workers sometimes one or two fail...

No issue with only one worker. Could this come from a concurrent operation performed by two workers?

Natooz commented 5 months ago

Minimal failing test method:

def _test_tokenize(
    midi_path: str | Path,
    tok_params_set: tuple[str, dict[str, Any]],
) -> None:
    r"""
    Tokenize a MIDI file, decode it back and make sure it is identical to the ogi.

    The decoded MIDI should be identical to the original one after downsampling, and
    potentially notes deduplication.

    :param midi_path: path to the MIDI file to test.
    :param tok_params_set: tokenizer and its parameters to run.
    """
    # Reads the MIDI and add pedal messages to make sure there are some
    try:
        midi = Score(Path(midi_path))
    except MIDI_LOADING_EXCEPTION as e:
        pytest.skip(f"Error when loading {midi_path.name}: {e}")

    # Creates the tokenizer
    tokenization, params = tok_params_set
    tokenizer: miditok.MIDITokenizer = getattr(miditok, tokenization)(
        tokenizer_config=miditok.TokenizerConfig(**params)
    )
Yikai-Liao commented 5 months ago

What does the worker means here?

If they just load and process midi files separately, I think they won't operate the same part of memory. Symusic doesn't use any global variables and static variables. Further, if they do operate the same part of memory, the bug would also appears on windows and linux.

And if the worker means a subprocess, which is used more common in python, there won't be any problem about "thread safe"πŸ€”

Also, I don't change the code of SoA from 0.3.3 to 0.3.4.

Yikai-Liao commented 5 months ago

Do you have an idea of how this could be achieved? The issue is that it only occurs with xdist, even with one worker

Do you mean that this bug can't be reproduced in python's built-in multiprocessing and thread library?

I don't know the implementation of xdist.

Natooz commented 5 months ago

I'm not sure of how xdist handles multiprocessing Indeed this is a strange bug as it occurs only on macOS.

Yikai-Liao commented 5 months ago

After several tests, I did find what caused the problem. In minimidi, I tried to copy 4 bytes at one time instead of one by one, for a slight performance improvement. Well, maybe this caused some problems about accessing invalid memory.

Now this problem have been fixed. You could also try it locally to make sure there are no other bugs left. @Natooz

Natooz commented 5 months ago

No more errors! πŸ™Œ Thank you for help on this!