M-Lin-DM / Deep-Audio-Embedding

Visualizing the structure of audio in 3D using deep convolutional autoencoders
23 stars 4 forks source link

Struggling to get it running locally #1

Open danryland opened 1 year ago

danryland commented 1 year ago

Hey Michael,

Firstly, love this concept.

I really want to run it locally to play around with it and potentially integrate it into an open source project I want to try and build.

I've forked and started to document and modify the file references to get it working using the example wav file you had in the root. (I'm on MacOS)

I'm completely stuck at Train_model_2sec-window.py because every time I run it, it fails and throws an error that I can't figure out how to fix/work around (See https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working#issuesblockers)

I'm working over here: https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working

You might have moved on from this project, but would love help in getting it running if you have any time at all.

Thanks for your time

M-Lin-DM commented 1 year ago

Hi Dan,

I'm glad you're finding it useful! I looked at the error, but I'll need to familiarize myself with the project again. I'll take a look and get back to you shortly.

I looked briefly at your Insidr page. That looks very cool. I'm actually also working on an audio visualization pipeline with a business partner (using a different approach than in this project.) Can I ask what you're trying to do using this method?

Thanks, Michael

On Wed, Sep 20, 2023 at 7:18 PM Dan Ryland @.***> wrote:

Hey Michael,

Firstly, love this concept.

I really want to run it locally to play around with it and potentially integrate it into an open source project I want to try and build.

I've forked and started to document and modify the file references to get it working using the example wav file you had in the root. (I'm on MacOS)

I'm completely stuck at Train_model_2sec-window.py because every time I run it, it fails and throws an error that I can't figure out how to fix/work around (See https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working#issuesblockers https://urldefense.com/v3/__https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working*issuesblockers__;Iw!!IKRxdwAv5BmarQ!YrvD0-b4CVfBPkr6zvhXRpQ8Hjtyzi2JMFuawdEDivjAxjlyB4X7Q1q9egXuHFE0eY2YkzEOcFF-dPRRSUyggQ$ )

I'm working over here: https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working https://urldefense.com/v3/__https://github.com/danryland/Deep-Audio-Embedding/tree/feature/get-it-working__;!!IKRxdwAv5BmarQ!YrvD0-b4CVfBPkr6zvhXRpQ8Hjtyzi2JMFuawdEDivjAxjlyB4X7Q1q9egXuHFE0eY2YkzEOcFF-dPTwyy8-WA$

You might have moved on from this project, but would love help in getting it running if you have any time at all.

Thanks for your time

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/M-Lin-DM/Deep-Audio-Embedding/issues/1__;!!IKRxdwAv5BmarQ!YrvD0-b4CVfBPkr6zvhXRpQ8Hjtyzi2JMFuawdEDivjAxjlyB4X7Q1q9egXuHFE0eY2YkzEOcFF-dPRDeMTVDA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AH2KA5OQEOACT4EIV6XBLXDX3N2VTANCNFSM6AAAAAA5AUZS5U__;!!IKRxdwAv5BmarQ!YrvD0-b4CVfBPkr6zvhXRpQ8Hjtyzi2JMFuawdEDivjAxjlyB4X7Q1q9egXuHFE0eY2YkzEOcFF-dPRKZ6JDZA$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

M-Lin-DM commented 1 year ago

Hi Dan,

So it looks like the error is due to the variables x_i and x_i_hat being unequal in shape. The error message shows that their shapes are 'tf.Tensor(shape=(8, 22, 94), dtype=float32)', 'tf.Tensor(shape=(8, 88, 94) respectively. The shape of x_i should be (8, 88, 94) instead of (8, 22, 94). This seems to trace back to the image_width being saved incorrectly as 22 in the prior step that you ran. I would guess that because the image_width got saved as 22, that you accidentally ran the Save-dataset_half-sec-window.py instead of Save-dataset_2sec-window.py! (since a 0.5 sec window would correspond to 22=88/4)

I have to apologize because this project got way more complicated than it needed to be, in order to get the desired result. Namely, it's possible to get rid of the whole LSTM branch of the model and just use a pure autoencoder, with just an image reconstruction loss, and you'll still get a good, if not better, embedding.

Let me know that fixes the issue! Michael

danryland commented 12 months ago

Thanks so much for taking the time to reply, Michael. I wanted to create a vector-database-inspired project but around music tracks. I was researching music/audio embedding to see if anyone had done something like this and came across your YouTube demo - it was super inspiring! Being able to export these charts/shapes into 3D objects could make for a super cool VR music experience. Especially if it drew the lines matched to the track. (Beatsaber-style) Ideally for my little side project, I wanted to pass ~20 music tracks from the same artist, store the music embeddings and provide embeddings for 1. recommendations based on sections of songs. 2. epic visual. Thanks for feeding back on the error, I'll run what you suggested and get back to you.