Js-Mim / aes_wimp

Support material and source code for the system described in : "New Sonorities for Jazz Recordings: Separation and Mixing using Deep Neural Networks".
GNU General Public License v3.0
13 stars 3 forks source link

solo stereo and custom trained models? #1

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi,

was wondering why the solo output is in mono when everything else is in stereo? Is there a way to modify this for stereo output for the solo file? I like to work with "additive components" and I can't if one is stereo and one is mono to sum up to the original mixture.

Also, I noticed you use pickle for trained models and I recently found out about pickle. However, my question here is there a way to create custom trained models with this algorithm using pickle? In other words, how exactly did you create these two pickled files? What source material is in them and what steps did you use to create them? I like to experiment and modify the way algorithms work to create new algorithms and this information would help me to create a new audio source separation algorithm.

Let me know and thanks for your time and work! Can't wait to see more!

Best regards, Dan

Js-Mim commented 8 years ago

Hey there!

The solo instrument is single channel because the initial target function (goal, so to say) existed only in that form. The assumption we are making here is that the solo is existent in both channels. Thus we can also estimate the background music in it's multi-channel format. Some more details will be given soon in the corresponding manuscript.

Custom models can be derived from any deep learning library. In this case "keras" was used.

All the best, S.

ghost commented 8 years ago

Hi,

thank you for explaining the solo output, I understand now. However, the algorithm maybe can be improved to tackle reverb information that is sometimes left by the solo instrument in the background music component.

You mentioned "keras". I am new to this and also new to using trained models, so if you can just tell me in general terms, nothing too detailed, what is the difference between the files pannet_mag and solo_suppression_mag, what exactly do they contain each? for example audio samples in mono, stereo? 16bit? and also how was keras used to save the assumed dataset in pickle format .p?

If I can know that information, I would be very happy, because this allows me to then try something new and interesting with this algorithm. Thanks for your time!

Best, Dan

Js-Mim commented 8 years ago

Hey,

some reverberation tails are left due to the initial training process (more information to come soon from the manuscript). They can be slightly compromised by variating the alpha value in the masking process (see the corresponding code-chunk and reference).

Both pickled files are lists. Each element of the list contains either a matrix or a vector, representing the weights, biases and the gates (if applied to the network model). A list represents a deep neural network architecture, thus two of them in total. The first architecture is responsible for estimating the solo instrument (first deep model maps from spectrogram to spectrogram), while the second one to uses the estimated spectrogram to "generate" a vector which contains maximum two non-zero components. Each non-zero component refers (when decoded) to either a panning location or gain value. These values are then used to modify the resynthesized waveform of the solo instrument which is finally mixed with the estimated accompaniment.

The above estimations/mappings are straked matrices multiplications(matrix from the lists), vector additions (biases from the lists) a non-linearity (activation function predefined) and for the first model transformations using "gating" matrices (see highway networks).

Summing it up, a library was used to derive/optimize such matrices/vectors for estimating specific signals. After a training procedure the results were "pickled" and stored. For the actual processing the above data is "unpickled" and used throughout the lines 142-169 and 190-202 of sonorities.py file. The aforementioned lines correspond to the equations that these deep models have.

Hope that this is helpful. Cheers, S.

ghost commented 8 years ago

Hi,

thank you for your answer, I understand now. I will need to do some personal research and spend time really understanding this more, new to deep neural networks techniques. Hopefully I can find some tutorials or examples to help me out. Thanks a lot for your answer, really helped me out! :)

Best, Dan