stempeg 2.0 - Githubissues

faroit commented 4 years ago

This addresses #27 and implements a new ffmpeg backend. I choose ffmpeg-python for reading and writing. Here the audio is piped directly to stdin instead of writing temporarly files with pysoundfile and converting them in a separate process call.

Part of the code was copied from spleeters audio backend. First benchmarks of the input piping indicate that this method is twice as fast as my previous "tmpfile based method".

Saving stems still requires to save temporarly files since the complex filter cannot be carried out using python-ffmpeg. This enabled a new API. Here the idea was to not come up with presets and do all the checks to cover all use cases but instead let users have to do this themselves. This means more errors for users, but its way easier to maintain. E.g. if a user wants to write multistream audio as .wav files, an error will be thrown, since this container does not support multiple streams. The user would instead have to use streams_as_multichannel.

This PR furthermore introduces a significant number of new features:

Audio Loading

Loading audio now uses the same API as in spleeters audio loading backend
A target samplerate can be specified to resample audio on-the-fly and return the resampled audio
An option stems_from_multichannel was added to load stems that are aggregated into multichannel audio (concatenation of pairs of stereo channels), see more info on audio writing
substream titles can be read from the Info object.

Audio Writing

stems can now be saved as substreams, aggregated into channels or saved as multiple files.
titles for each substream can now be embedded into metadata
in addition to write_stems (which is a preset to achieve compatibility with NI stems), we also have write_streams (supports writing as multichannel or multiple files). And, in case, stempeg is used for just stereo files, write_audio can be used (Again this is API compatible to spleeter).

The procedure for writing stream files may be quite complex as it varies depending of the specified output container format. Basically there are two possible stream saving options:

1.) container supports multiple streams (mp4/m4a, opus, mka) 2.) container does not support multiple streams (wav, mp3, flac)

For 1.) we provide two options:

1a.) streams will be saved as substreams aka when streams_as_multichannel=False (default) 1b.) streams will be aggregated into channels and saved as multichannel file. Here the audio tensor of shape=(streams, samples, 2) will be converted to a single-stream multichannel audio (samples, streams*2). This option is activated using streams_as_multichannel=True 1c.) streams will be saved as multiple files when streams_as_files is active

For 2.), when the container does not support multiple streams there are also two options:

2a) streams_as_multichannel has to be set to True (See 1b) otherwise an error will be raised. Note that this only works for wav and flac).

file ending of path determines the container (but not the codec!). 2b) streams_as_files so that multiple files will be created when streams_as_files is active

Example / Use Cases

"""Opens a stem file and saves (re-encodes) back to a stem file
"""
import argparse
import stempeg
import subprocess as sp
import numpy as np
from os import path as op

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'input',
    )
    args = parser.parse_args()

    # load stems
    stems, rate = stempeg.read_stems(args.input)

    # load stems,
    # resample to 96000 Hz,
    # use multiprocessing
    stems, rate = stempeg.read_stems(
        args.input,
        sample_rate=96000,
        multiprocess=True
    )

    # --> stems now has `shape=(stem x samples x channels)``

    # save stems from tensor as multi-stream mp4
    stempeg.write_stems(
        "test.stem.m4a",
        stems,
        sample_rate=96000
    )

    # save stems as dict for convenience
    stems = {
        "mix": stems[0],
        "drums": stems[1],
        "bass": stems[2],
        "other": stems[3],
        "vocals": stems[4],
    }
    # keys will be automatically used

    # from dict as files
    stempeg.write_stems(
        "test.stem.m4a",
        data=stems,
        sample_rate=96000
    )

    # `write_stems` is a preset for the following settings
    # here the output signal is resampled to 44100 Hz and AAC codec is used
    stempeg.write_stems(
        "test.stem.m4a",
        stems,
        sample_rate=96000,
        writer=stempeg.StreamsWriter(
            codec="aac",
            output_sample_rate=44100,
            bitrate="256000",
            stem_names=['mix', 'drums', 'bass', 'other', 'vocals']
        )
    )

    # Native Instruments compatible stems
    stempeg.write_stems(
        "test_traktor.stem.m4a",
        stems,
        sample_rate=96000,
        writer=stempeg.NIStemsWriter(
            stems_metadata=[
                {"color": "#009E73", "name": "Drums"},
                {"color": "#D55E00", "name": "Bass"},
                {"color": "#CC79A7", "name": "Other"},
                {"color": "#56B4E9", "name": "Vocals"}
            ]
        )
    )

    # lets write as multistream opus (supports only 48000 khz)
    stempeg.write_stems(
        "test.stem.opus",
        stems,
        sample_rate=96000,
        writer=stempeg.StreamsWriter(
            output_sample_rate=48000,
            codec="opus"
        )
    )

    # writing to wav requires to convert streams to multichannel
    stempeg.write_stems(
        "test.wav",
        stems,
        sample_rate=96000,
        writer=stempeg.ChannelsWriter(
            output_sample_rate=48000
        )
    )

    # # stempeg also supports to load merged-multichannel streams using
    stems, rate = stempeg.read_stems(
        "test.wav",
        reader=stempeg.ChannelsReader(nb_channels=2)
    )

    # mp3 does not support multiple channels,
    # therefore we have to use `stempeg.FilesWriter`
    # outputs are named ["output/0.mp3", "output/1.mp3"]
    # for named files, provide a dict or use `stem_names`
    # also apply multiprocessing
    stempeg.write_stems(
        ("output", ".mp3"),
        stems,
        sample_rate=rate,
        writer=stempeg.FilesWriter(
            multiprocess=True,
            output_sample_rate=48000,
            stem_names=["mix", "drums", "bass", "other", "vocals"]
        )
    )

faroit commented 4 years ago

@romi1502 @mmoussallam I really like the simple ffmpeg adapter you implemented for spleeter. I took some code from spleeter to move it into stempeg. I extended the function to also support reading and writing multistream/stem files. The basic (stereo) read/write, is still API compatible to the ffmpeg adapter you have in spleeter. Therefore I would love your feedback on the following:

Are you okay with copying these parts? I can add credits in the docstring if you like
Would you be interested in replacing you code and use stempeg directly? stempeg is already a dependency for spleeter, so you won't change much for users. Spleeter users would then benefit from being able to save into stem format.
If yes to the previous questions, it would make sense (and would be great) if you could review this PR.

mmoussallam commented 4 years ago

Hi @faroit, hope you're fine and safe.

Thanks for the suggestion,it would definitely make sense to allow writing stems as output in spleeter.. Give us a few days to look into it and come back to you.

Best

faroit commented 4 years ago

@mmoussallam 👍 sounds good.

Just a few more notes:

stempeg was actually not a requirement for spleeter, I thought you had musdb there. So yes, please evaluate if this would justify adding another dependency.
I quickly hacked the spleeter ffmpeg audio adapter to use stempeg instead for loading and writing. See here: https://github.com/deezer/spleeter/pull/357 as you can see, writing stems is quite simple as you can directly pass the estimate dictionary
I noticed that the spleeter AudioAdapter does not differentiate between audio containers/extensions and the actual codec. That is an issue as eg. mp4/m4a is a container but not a codec. The codec is aac. FFMPEG does select a default codec when selecting a container/extension. But I would suggest to extend spleeters code to enable extensive control for this. I can add an issue for this if you agree.

mmoussallam commented 4 years ago

Hi @faroit

Between Icassp and the coming ismir and recsys deadlines we're a bit overwhelmed right now but I see no blocker on our side to switching to stempeg. Indeed the distinction between codec and container will be nice to add.

Thanks a lot for the PR. I propose that we wait for this to be merged and we'll take on our side to complete it on the spleeter side.

Best

faroit commented 4 years ago

@mmoussallam Sounds good. I will make this ready here and would love your comments on the API when its ready (before merging).

pseeth commented 4 years ago

Just wanted to comment, I pip installed this version of stempeg for use in a script which decodes and resamples the MUSDB stems in nussl, and it is blazing fast compared to the current version of stempeg. I hope this makes it into stempeg proper soon! Great work!

faroit commented 4 years ago

@pseeth I actually forgot to do add proper regression benchmarks to test the loading speed in musdb. Now I compared this branch with the master and stem loading is actually slower than with using the old method :-/ can you test again in your setup with these two branches?

faroit commented 4 years ago

@pseeth can you test this quickly for me? I really can't see the speedup anymore... just re-run your code with either the pypi version compared to this PR

faroit commented 4 years ago

@mmoussallam this is stalled by #31 but exporting stems that can directly be used with NI Traktor sounds like totally worth it to wait for ;-)

Would be happy if you (or someone at Deezer) could provide some help...

mmoussallam commented 4 years ago

HI @faroit , this seems tricky. A friend of mine (Mickaël Legoff that you may already know) used to work at NI and could probably give us a hand here. let me ask him.

We've had to deal with conda-forge recipe ourselves and it did not went so easily :/ I don't know much about the mp4box stack but aren't you worried it will add quite a heavy dependency ?

faroit commented 4 years ago

@mmoussallam yeah sure I know Mickaël, he is the reason why I proposed to release MUSDB18 in stems format ;-)

Concerning mp4box, it seems quite easy to build (compared to ffmpeg) so maybe it's not that hard to do...

faroit commented 3 years ago

@mmoussallam I had some time to continue working on this during ISMIR. This should be finally ready for another re-review. Again, this should be designed so that its useful to be integrated in spleeter (for writing).

I significantly updated the API to make it easier to understand using different writer backends that could be used. E.g. there are now FilesWriter, StreamsWriter, ChannelsWriter, NIStemsWriter Objects that can be passed to the write_stems functions, each having its own parameters.

See the basic example file for an overview of all the writing features: https://github.com/faroit/stempeg/blob/8ed4655b25e3ea20f60ea9de5f3da4528e005288/examples/readwrite.py

Also, I added multiprocessing to the FilesWriter API to speed up things. Multiprocessing is disabled by default as I found that sometimes opening too many files on macOS results in some problems.

Furthermore, I managed to get full Native Instruments stems compatibility (Issue reported in #32). Using MP4box for metadata.

Since a conda package for GPAC is currently missing and very few users actually would need this functionality I would propose just mention what need to be installed in the readme.

Here is the API for the NI stems writer:

    # Native Instruments compatible stems
    stempeg.write_stems(
        "test_traktor.stem.m4a",
        stems,
        sample_rate=96000,
        writer=stempeg.NIStemsWriter(
            stems_metadata=[
                {"color": "#009E73", "name": "Drums"},
                {"color": "#D55E00", "name": "Bass"},
                {"color": "#CC79A7", "name": "Other"},
                {"color": "#56B4E9", "name": "Vocals"}
            ]
        )
    )

Let me know what you think. It would be great if this can be merged in the next week since I want to use it in a new version of open-unmix.

Cheers

faroit commented 3 years ago

@pseeth I added multiprocessing to the stems reading, In my tests, this improves speed by another 20% :-)

faroit commented 3 years ago

@axeldelafosse can you check out the new version for stems compatibility?

faroit commented 3 years ago

@aliutkus something to add here?

axeldelafosse commented 3 years ago

Hey @faroit! Good job! Unfortunately Traktor cannot read the stem metadata:

faroit commented 3 years ago

Hey @faroit! Good job! Unfortunately Traktor cannot read the stem metadata:

This should be fixed now. Test using

import stempeg
stems, rate = stempeg.read_stems(stempeg.example_stem_path())
stempeg.write_stems(
    "test_traktor.stem.m4a",
    stems,
    sample_rate=rate,
    writer=stempeg.NIStemsWriter()
)

axeldelafosse commented 3 years ago

Cool! However I get this error:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    writer=stempeg.NIStemsWriter()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/stempeg/write.py", line 726, in write_stems
    sample_rate=sample_rate
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/stempeg/write.py", line 530, in __call__
    stem_names=['Mix'] + [d['name'] for d in self.stems_metadata]
TypeError: 'NoneType' object is not iterable

Am I missing something? I'm testing with the latest changes and re-installed stempeg via pip install .

faroit commented 3 years ago

Am I missing something? I'm testing with the latest changes and re-installed stempeg via pip install .

@axeldelafosse looks like your repo doesn't contain the default_metadata.json... its part of the MANIFEST so it should work... 🤷

Here is simple colab notebook to test that https://colab.research.google.com/drive/1cuTrBnjuBWANiW_fnseT1pfhPzlcGigX?usp=sharing Let me know if you find the issue.

faroit commented 3 years ago

@mmoussallam @pseeth this is ready for a review.

axeldelafosse commented 3 years ago

@faroit I have the default_metadata.json so I don't know what's the issue... Anyway. Thanks for the colab. It works! Great job.

faroit commented 3 years ago

@faroit I have the default_metadata.json so I don't know what's the issue... Anyway. Thanks for the colab. It works! Great job.

@axeldelafosse Can you try with a clean environment? Also can you run the unit tests?

faroit commented 3 years ago

Ping @mmoussallam

mmoussallam commented 3 years ago

Hi @faroit

Thanks for this and great work. I'll hopefully find some time next week to look at it carefully.

faroit commented 3 years ago

@mmoussallam great. This is will also be used for the next version of open-unmix so it would be great to have this unblocked soon ;-)

faroit commented 3 years ago

@mmoussallam thanks for the checks, these are corrected now. Did you checkout https://github.com/deezer/spleeter/pull/357 to see if the new stempeg api could be useful in spleeter? if there are minor things to be changed later thats fine as long as the api looks good to you. Let me know if this can be merged then

mmoussallam commented 3 years ago

Hi @faroit

Sorry it took me some time to finish reviewing this. It all seems good to me. congrats on the rework I think the API looks really great now!

I'm planning on doing some tests on the spleeter integration later this week.

faroit commented 3 years ago

I'm planning on doing some tests on the spleeter integration later this week.

@mmoussallam sounds great. let me know if there is anything left to do. Now lets create some REAL stems! ;-)

faroit / stempeg

stempeg 2.0 #28

Audio Loading

Audio Writing

Example / Use Cases