Is relabelling of Slakh dataset required when combining with MUSDB18 dataset?

Sorry, but i am new to this topic. Can i ask if I would want to use MUSDB 18 and Slakh dataset together for my model, does that mean I have to manually relabel all the tracks for Slakh? This is because MUSDB 18 labelled their tracks as follows. 0 - The mixture, 1 - The drums, 2 - The bass, 3 - The rest of the accompaniment, 4 - The vocals

However, for Slakh since there are many instruments, it is labelled differently as shown in this link https://github.com/ethman/slakh-utils/blob/master/midi_inst_values/general_midi_inst_0based.txt

Since Slakh has a large number of tracks, I would like to ask if there is any efficient way to go about relabeling the tracks? Or there is already a full combine set available (i couldn't find it). If there isn't, is it possible to provide an instruction guide to teach how I could actually relabel the tracks in Slakh to be like MUSDB18? Do I have to download the stem creator tool used in MUSDB18 by Native Instruments ( https://www.stems-music.com/stem-creator-tool/ ) and rearrange the tracks into the 4 different types of categories?

stem_creator_tool_full

Hello! Thanks for reaching out! First let me talk about the general case for combining different instrument tracks in Slakh (called submixing), then I'll talk about how this fits together with MUSDB18.

Submixing

If you want to combine certain instrument types (called submixing) check out the instructions here: https://github.com/ethman/slakh-utils/tree/master/submixes. The basic idea is that you can define which instrument types get mixed together to make each stem.

For example, let's say there's a song with three isolated tracks: 1) a Rhodes electric piano, 2) an acoustic piano, and 3) an (acoustic) upright bass guitar. There's a few ways to combine these tracks: we can mix them like "pianos vs. bass" where the acoustic & electric piano are submixed together and the bass isn't submixed with anything, or we can mix them like "acoustic vs. electric" where the acoustic piano and upright bass are submixed together, and the electric piano isn't submixed with anything. How these tracks get combined into submixes depends on your use case. This is a decision that you make (in this case, the decision was either to do "piano vs. bass" submixing or to do "acoustic vs. electric" submixing).

How do you figure out which audio files correspond to which instrument types? Well, they're all labeled in the metadata (detailed here). This metadata tells you exactly what type of instrument each file is. So: no, you do not need to manually label anything.

This is what the submixing code linked above does, it allows you to define which tracks get submixed together. Read the directions on the page to understand how to set up the code to make submixes.

Combining Slakh2100 & MUSDB18

Okay, so now we understand how to create submixes in Slakh, how do I make that compatible with MUSDB18? The bad news is that there's not a perfect match between how the instruments tracks are defined in MUSDB18 and how they're defined in Slakh. For instance, because Slakh data is all synthesized with VSTs, there are no vocal tracks, unlike MUSDB18 in which vocals is one of the four instrument tracks.

However, 3 of the 4 instrument tracks in MUSDB18 overlap with the type of instruments available in Slakh; the three non-Vocals tracks overlap with Slakh. That would be Bass, Drums, and Other. So it's possible to make Bass, Drums, and Other submixes that match MUSDB18's definition of those submixes.

In order to do that, you can just define submixes and run the submixing script as described in the linked docs above. You'll define a Bass instrument and a Drums instrument. Every other type of instrument (aka MUSDB18's Other instrument) will get created automatically in a file called residuals.wav. This is precisely what we did when comparing MUSDB18 and Slakh2100 in this paper (see table 2 for the clearest example).

I hope this helped! Let me know if you have any more questions; I'm happy to help!

Hi, Sorry for the late reply. I tried to follow the instructions under the submixes link you have given. However, it seems like i am still stuck. This are the following steps that i have done. 1) I create a new folder named "submixes_relabel" that consists of 3 files. band.yaml, drums.yaml and bass.yaml These 3 files are copied from the examples. Just that i removed piano and guitar from the original band.yaml 2) After installing the requirement.txt and convert flac files to wav format and label the folder as slakh2100_wav 3) For the submixing, i followed the code format and typed the following:

python submixes.py -submix-definition-file ""D:\slakh-utils-master\submixes\submixes_relabel" -i "D:\datasets\slakh2100_wav"

However, i got the following error: Traceback (most recent call last): File "submixes.py", line 118, in parser.add_argument('-src-dir', '-s', type=str, required=False, File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1386, in add_argument return self._add_action(action) File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1749, in _add_action self._optionals._add_action(action) File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1590, in _add_action action = super(_ArgumentGroup, self)._add_action(action) File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1400, in _add_action self._check_conflict(action) File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1539, in _check_conflict conflict_handler(action, confl_optionals) File "C:\Users\jasli\anaconda3\lib\argparse.py", line 1548, in _handle_conflict_error raise ArgumentError(action, message % conflict_string) argparse.ArgumentError: argument -src-dir/-s: conflicting option string: -s

Thus, I would like to know if i made any errors anywhere. There are also some questions i have while doing the submixing 1) Do i still need the band.yaml file if i already define a Bass instrument and a Drums instrument as bass.yaml and drums.yaml respectively? 2) How do i know that the drums,bass and others are labelled as 1,2 and 3 respectively (like MUSDB18 dataset).
3) When using both datasets togehter, do I need to follow MUSDB18, to have a mixture labelled as 0 for Slakh files? If yes, how do i approach to do it? 4) In the submixing instructions, you mention the following:

python submixes.py [-h] -submix-definition-file SUBMIX_DEFINITION_FILE [-input-dir INPUT_DIR] [-src-dir SRC_DIR] [-num-threads NUM_THREADS]

  -submix-definition-file SUBMIX_DEFINITION_FILE

What i type: -submix-definition-file "D:\slakh-utils-master\submixes\submixes_relabel" I put the dir of the folder where the 3 yaml file exit. Am i doing it right or must it be individual bass.yaml file? drums.yaml file or a combination of both as band.yaml file?

  -i INPUT_DIR

What i type: -i "D:\datasets\slakh2100_wav" I put the dir of folder where the folder contains 3 sub folders: test, validation and train. When I proceed with the submixing, do I have to do the sub folders individually (ie. D:\datasets\slakh2100_wav\test) or i can do all at once (ie.D:\datasets\slakh2100_wav). I am confirming this because as I did the conversion from flac to wav, i had to do it individually, so i am not sure if it is the same for submixing.

   -src-dir SRC_DIR

For this, i initially did not type anything as it seems like it is to create a submix for individual tracks. However, since i want to relabel the whole Slakh audio files, I thot that providing the input dir is sufficient. However, i did try to type -src-dir "D:\datasets\slakh2100_wav\submix" to see what results i am suppose to have, but all i got was the error stated above.

   -num-threads NUM_THREADS

I did not type this as well as I have no idea what does: Number of threads to spwan to do the submixing means? If i want to relabel all the files to use it together with MUSDB18, what is the number of threads needed to spawn?

I apologize if there are some basic points that I could not understand as I am quite new to doing this for a school project. It would be great if it is possible to do a short instructional video to better understand the text instructions. Thank you very much for your time.

Hey, it looks like the error you've encountered was fixed in some pull requests (see here or here). So pull one of those two and the script should work.

Hi, i run the script mentioned and manage to get a folder that has all the bass instruments to be under bass.wav, all the drums under drums.wav, and the rest under residuals.wav.

However, i did have the following warning while running the code:

submixes.py:34: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. self.submix_data = yaml.load(open(submix_file, 'r')) submixes.py:69: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. src_metadata = yaml.load(open(os.path.join(srcs_dir, 'metadata.yaml'), 'r'))

Now that i manage to have a folder that has drums and bass, how do i know that the drums.wav, bass.wav and residuals.wav are labeled as 1, 2 and 3 respectively (like in MUSDB18)? Do i need to label the mix files as 0? If yes, how do i do so.

MUSDB18 labeling 0 - The mixture, 1 - The drums, 2 - The bass, 3 - The rest of the accompaniment, 4 - The vocals

Hello!

Don't worry about those warnings. I can update the code to get rid of them soon, but they're not a problem for you at all.

There is a mix.wav audio file included in all of the Track folders when you download Slakh2100. So the mixtures are ready to go when you download it. See the documentation here for details about what's included when you download Slakh2100. The mixes included with Slakh2100 have all of the instrument tracks summed together to make the mix (i.e., all of the files in Track00001/stems/SXX.wav are added together to make Track00001/mix.wav).

In your case, the mixes are going to be mix.wav = bass.wav + drums.wav + residuals.wav. That's what the submixes script does: it aggregates all of the audio files in Track00001/stems/ into bass submixes (bass.wav) and drum submixes (drums.wav) and everything else that isn't bass or drums (residuals.wav).

So if you want mixes with just bass and drums, just add together bass.wav and drums.wav. But because you have omitted residuals.wav this mixture won't equal the mixture that came with the downloaded version of Slakh2100.

I think this is what you're asking. But if you're asking about how to get the audio into the stempeg format, then that is outside of the scope of this project. For more information on how to do that see this library and its documentation.

Taking a step back:

What is your ultimate goal?

Are you trying to train a model on both MUSDB18 and Slakh2100 data? If so, then you probably don't want to put your data back into the stempeg format. The stempeg format is used to store and transport all audio stems so they stay together and have the same metadata, but you can't actually train on the stempeg data without decompressing them to arrays. (See how the read_stems() function here actually returns a 3D array of shape shape=(stem x samples x channels)? That's the decompressed, raw audio arrays from the stem file.)

If this is your goal, then convert all of the MUSDB18 files to arrays using that library and simultaneously read the Slakh2100 audio tracks as you normally would. You'll have to be keeping tabs on the audio from the different datasets to make sure you get the mixing and track labels sorted correctly for your use case.

Hi, thank you for taking the time to explain to me but I think I may not have expressed my questions properly or i may not have understood your replies entirely.

My main goal is to use both MUSDB18 and Slakh2100 data together to train my model. However, I was confused on how to start as both datasets have different instruments and labels assigned. Thus, I opened this issue so that I can learn how to use both datasets together like how you did in the paper.

From your advice, I should make use of submixing to achieve my goal. While submixing did provide me a new bass.wav, drums.wav and residuals.wav, it occurred to me that I do not know what are the new labels assigned to them. For example, given that the following is my yaml file, after submixing them, how do I know what is the new label assigned to drums and bass ? Because I did not state that the new label of drums is 1 nor did I state that the new label of bass is 2.

Mixing key: "program_num" Recipes: Drums:

128 Bass:

32

33

34

35

36

37

38

39

From your latest reply, you mention that I have to convert all of the MUSDB18 files to arrays and simultaneously read the Slakh2100 audio tracks while making sure that the mixing and track labels sorted correctly. 1) But I do not know whether the new bass.wav, drums.wav, and residuals.wav are being newly labeled so how do I even make sure that both are sorted correctly. 2) As MUSDB18 dataset combines the stem and puts them together using Native instruments, if I am using both datasets together, do I also need to do this process for Slakh2100 audio tracks? Else, how am I supposed to train the model with the format being different.

If it seems to you that I am off-track from my goal, would it be possible for you to state how you manage to use both datasets together and put them to train in the model like how you did in the paper. All I manage to do so far is submixing as seen above as well as converting flac to wav.

Hi again!

Labeling of Instrument Tracks

From your advice, I should make use of submixing to achieve my goal. While submixing did provide me a new bass.wav, drums.wav and residuals.wav, it occurred to me that I do not know what are the new labels assigned to them. For example, given that the following is my yaml file, after submixing them, how do I know what is the new label assigned to drums and bass ? Because I did not state that the new label of drums is 1 nor did I state that the new label of bass is 2.

Those numbers come from the official MIDI specification, which is reproduced here. Let's look at the example you posted (I've reordered it to explain it better). I'll annotate the file excerpt below to explain what's happening when it gets passed into submixes.py:

Mixing key: "program_num"     # This tells submixes to look at MIDI instrument program numbers.

Recipes:                      # This defines which instruments get mixed together as one submix.

  Bass:                       # This is the first submix, called "Bass".
    - 32                      # These numbers are all a list of MIDI instrument program numbers.
    - 33                      # These are interpreted as MIDI instrument program numbers because 
    - 34                      #     "Mixing key" in line 1 is set to "program_num".
    - 35                      # Any instrument track in Slakh that has a "program_num" in this list
    - 36                      #      will be summed together to a submix file called "bass.wav"
    - 37
    - 38
    - 39
  Drums:                      # This is the second submix, called "Drums".
    - 128                     # This is another list of MIDI program_nums for "drums.wav".

So if you ran the submixes.py code with the above recipe (which it sounds like you did), you would have three submixes in each Slakh track (one bass.wav, one drums.wav, and one residuals.wav).

Matching to MUSDB18

How does this match up with MUSDB18? Well let's look at MUSDB18's instrument labels:

0 - The mixture 1 - The drums 2 - The bass 3 - The rest of the accompaniment 4 - The vocals

So it looks like the Slakh submix bass.wav matches with MUSDB's label 2 (bass), the Slakh submix drums.wav matches with MUSDB's label 1 (drums), and the Slakh submix residual.wav matches MUSDB's label 3 (rest of accompaniment). (Notice there is no singing in Slakh, so no matching data for MUSDB's label 0, the vocals.)

So this is how the instrument labels match up. But before you dive into training, let's discuss a few things...

Training a network: `.wav` vs. `.stempeg`

As you mentioned, Slakh and MUSDB are both shipped with different data formats. So how are we supposed to train a model? Well, we simply have to convert one of the datasets to the other format. Which dataset should we convert? Well, if your ultimate goal is to train a model on both datasets, you want the data to be uncompressed (i.e., all your files should be .wav files, not .stempeg files).

The Native Instruments tool that you linked turns a set of .wav files into one .stempeg file; this is the opposite of what you want. You want to convert MUSDB18's stempeg files to a set of wav files.

(I'll note that Slakh also comes in a compressed format too, as .flac files. You probably already converted Slakh to .wav files. Converting MUSDB18 from .stempeg to .wav is an analogous process: compressed --> decompressed.)

Aside: On compressed audio signals

Why do we want our audio as wav files (uncompressed) rather than stempeg files (compressed)? stempeg files are compressed to save space in storage and transit (they're pretty much .mp4 files). But in order to actually access the data, you need to decompress it (sometimes called "decoding"). For listening to a song just once through Spotify or YouTube, this decompression step is quick enough to happen faster than real time so we can enjoy a song without even hearing this decompression happening. But when we're training a neural network, the network has to "hear" the audio many, many times faster than real time; it needs to "listen" to ~30-45 seconds of audio every second (note: this is a back-of-the-envolope calculation. Don't quote me on it 😅). That's just too much data to wait for it to be decompressed for every second of audio. (Not only that, but the data needs to be moved from the CPU to the GPU, which can be another huge bottleneck in the training process.) So this means that it's prudent to decompress your data to .wav files all before you start training.

Why .wav files? Well, as I've been alluding to, .wav files are uncompressed, which means that we can load them as numpy arrays very easily (or pytorch/tensorflow Tensors). They're pretty much numpy arrays stored to disk (*not exactly, but look into hdf5 or zarr for actually saving tensors to disk, OR just use nussl, which does this all for you).

The Bottom Line

Convert all of your audio files to .wav files and save them to disk ahead of training. You will need access to a lot of storage (recall that Slakh is ~500Gb of data by itself). This will make training much faster.

Training with `nussl` and a tutorial

Okay, so how can you train a network using these datasets? Well, if you have any questions about any steps in the training process, I invite you to check out a tutorial my colleagues and I gave at ISMIR this year. All of the material is available here. It includes info on how to get your data set up, as well as practical tips and tricks for ensuring your project is successful.

The tutorial uses nussl, which already has a MUSDB18 dataset hook here (in depth details on how to use MUSDB18 are in the tutorial). There's also a Slakh2100 dataset hook in development here. It's not production ready yet, but feel free to look at it to see how you can use it for your project.

How do you combine them? Well that depends on how you're training and that's up to you. The way we did it in the paper was to train a model on one instrument in one dataset, and test the same instrument on the other dataset. But there are tons of ways to do it. So that's a decision you'll have to make as part of your experimental design process. The building blocks are all here ready to be used.

Best of luck!

Closing due to lack of activity. Feel free to reopen if you have more questions.

ethman / slakh-utils