Open zahrakhanjani128 opened 6 months ago
Hey, thanks for taking interest!
That sounds like a really fun experiment and I would love to see the results!
Do you have a way to generate a dataset of positive and negative samples? This would required either a more specific augmentation pipeline or you could maybe leverage the temporal structure of audio files. For example, you could divide an audio file into segments, create spectrograms for each segment and use them as positives. Spectrograms from other audio files would be used as negatives.
Let me know if this makes sense or if you have another approach on your mind. Once we have good positive and negative samples, we can easily apply ReLIC to your problem!
Thank you so much for your response and the great idea. I really appreciate it! Some of my audio files are AI-generated (fake) and some genuine audio samples (real). What if I use fake ones as positive and real ones as negative? Does it work? Then instead of extracting spectrograms based on each audio segment, we can extract that for the entire audio clip. The downstream task is detecting fake samples.
Filip, I wait for your great idea on this!
Hi, sorry for the late reply!
You could still try using ReLIC, but I think your problem is a better setup for a binary classification task, since you have a way to generate positive and negative samples.
I think that you could achieve great results by training a CNN or fine-tuning a spectrogram transformers model, depending on the volume of the data.
Wish you all the best with your project!
Hi Filip, Thank you so much for your guidance, I have done some CNN based models too, I am a student trying to solve a project on fake audio detection. A little confusing! Does not ReLIC work for a binary classification problem? Should it be only multi-class problems to be solved by ReLIC? I remember in the paper they have an example of Cat and Dog classification. Many thanks in advance for any other guidance and clarification
Hi, I have a dataset including spectrogram photos extracted from audio data, I would love to apply ReLIC on it to see if it helps with my downstream task or not. Could you please guide me how to apply ReLIC on my own dataset? Thanks a lot in advance!