ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

OrcaCNN: Detecting and classifying killer whales from acoustic data #17

Closed yosoyjay closed 4 years ago

yosoyjay commented 5 years ago

ESIP Member Organization

Alaska Ocean Observing System (AOOS) and Axiom Data Science

Mentors

Jesse Lopez, Dan Olsen

Project Idea

Build, compare, and analyze neural network models built to detect Killer Whales from passive acoustic data.

Information for students

Just the generally information for students. See ESIP Student Guide

Abstract

Killer whales, or orca, inhabit the worlds oceans, but details about local populations is difficult to discern due to a lack of data. The recent increasing deployment of hydrophones, or underwater microphones, has provided researchers with raw data to study killer whale populations. Unfortunately, the current tools available to automatically identify killer whales still requires substantial effort by requiring manual verification and do not provide any detail about the particular pod, or killer whale group, that was detected. This project aims first to aid killer whale researchers by developing a custom Convolution Neural Network classifier that will automatically identify killer whales in a passive acoustic dataset from Alaska. The second, more ambitious goal, is to extend the model to detect the presence of killer whales and identify the particular pod.

Technical stuff

Python: TensorFlow, PyData (numpy, scipy, sklearn, matplotlib, et al.)

Helpful Experience, but not required!

Python, machine learning, acoustics, data analysis and visualization

First steps

  1. Read https://ai.googleblog.com/2018/10/acoustic-detection-of-humpback-whales.html

  2. Read https://github.com/jaimeps/whale-sound-classification

  3. Explore the data that will be used for this project and work through tutorials for training CNN on audio data

  4. Project repo

Ruturaj123 commented 5 years ago

Hi @yosoyjay, I would like to work on this project. I have previous experience of working with CNNs and would like to contribute to this. After going through the First steps mentioned above, what is the next step?

sinAshish commented 5 years ago

Hi @yosoyjay I am interested in this project. I have previously worked on audio classification. Please guide me further!

ZER-0-NE commented 5 years ago

Hi @yosoyjay. I am Abhishek Singh, a 3rd year undergraduate at NIT Durgapur. I am excited to apply to GSoC this year with this project idea. My background: In my sophomore year, I published a research paper, Detection of Rare Genetic Diseases using Facial 2D Images with Transfer Learning, where I used a CNN(VGGFace with ResNet50 architecture) and beat the previous SOTA achieving an accuracy of 98% and 0.86 F1 score using Keras with Tensorflow Backend. I have used Transfer Learning in this, by removing the top layers and using my own classifier (SVM and FC layer).

A few months back, I completed another project for sentiment analysis and used CNN+RNN for Image captioning with LSTM cells. I made a web-app for the same.

I have been since then coding and learning about other ML algorithms like RNN and Reinforcement Learning and developing small projects on the go.

I host most of my projects on GitHub. I have gone through the first steps and would like to know further about this idea.

Sanditya2510 commented 5 years ago

Hi @yosoyjay. I am interested to contribute to this project. Annie Burgess referred me here.Please provide me the dataset so that I can start the work. I have been involved in several research projects under reknowned professors. I have won several competitions of data science. Some of my projects include a virtual mouse controller, face swapped on live video etc.

mrizwank97 commented 5 years ago

Hello, @yosoyjay and everyone else. This is a very exciting project. I have gone through the first steps and now I have the overall idea of the project. I want to contribute to this project. I have previous experience with machine learning algorithms including simple neural networks to convolutional neural to recurrent neural networks and reinforcement learning. Its been over a year since I started pursuing machine learning and deep learning. I have contributed to the government level project in my university lab. I want further updates on the project and the dataset so that I can start working on a prototype. Waiting for a response.

parthpm commented 5 years ago

Hi @yosoyjay,I am interested to contribute to this project. I have previous experience of working with different architecture of CNNs & RNNs and would like to contribute to this. After going through the first steps mentioned above, what is the next step?

1998at commented 5 years ago

Hi @yosoyjay I would like to Contribute to the project.I have A lot Of Experience In This Kind Of Projects and have also been in top 20% in some kaggle competitions and have scored second place in HackerEarths deep learning challenge regarding Image Classification.I have also Worked on Speech Detection using CNN.Could you provide with some sample data to start working upon.

catmss commented 5 years ago

Hello @yosoyjay ! I would really like to contribute to this project, I have worked with CNNs and am currently studying digital signal processing at my university. Having worked with image data for a while, I am willing to learn about audio signals and application of machine learning in the same.

pranavbudhwant commented 5 years ago

Hello, @yosoyjay Sir, I am Pranav Budhwant a pre-final year CSE student at the Pune Institute of Computer Technology. This is a very interesting project & I have worked on similar projects involving audio processing & spectrogram classification. I have gone through the first steps & would like to know more about the further steps.

kunakl07 commented 5 years ago

Sir,is this project taken? As,I would like to work on this project

yosoyjay commented 5 years ago

Hi all! Sorry for delay in responding, I was out of communication for last Friday through this morning. Thanks for replying and showing interest in this project, I think it's really cool and has the opportunity for real use for scientists studying Killer Whales, or Orca.

I'm going to be preparing some next steps with code samples, a repo, and questions I'd like any folks interested in the project to work on to see if we are a good fit for this project :). To get you started on this, I'd like for you to think about and prepare to describe:

@Ruturaj123, @sinAshish, @ZER-0-NE, @Sanditya2510, @nustian16, @parthpm, @at1998, @catmss, @pranavbudhwant, @kunakl07

kunakl07 commented 5 years ago

Sir,I am working on LSTM NN as in a recurrent net,the output data's content is influenced not only by the input we've just put in,but by the entire history of inputs through our recurring loop. LSTM a type of RecurrentNN remembers everything it's fed therefore it would outform recurrent neural network and Sir,do I need to compare results of RCNN,Recurrent NN and LSTM?

1998at commented 5 years ago

Hi @yosoyjay My name is Ayush Patel (third year UnderGrad) and i have been doing Machine And Deep Learning(especially Computer Vision) for almost a year and a half now.In this time 1)I have Interned at a small Research Company and worked upon feature discovery in Images using Deep Learning. 2) I have built a lot of models like Image Captioning,Segmentation,Image Classification Models and Object Detection Models from scratch and trying to get them to accuracy as mentioned in the paper. 3)I have also scored the second place in "HackerEarth's Deep Learning Beg Challenge" by creating a custom CNN that reached "96.34% accuracy" 4) I have been in top 20-30% of 3 Kaggle Competitions including "Humpback Whale Identification Challenge" 5)Scored 14 out of more than 2000 people in "Caavo Computer Vision Challenge" As to why I am interested in this project. 1)I have worked on a very similar project when i tried implementing my own speech recognition for "Hindi Language" 2)I find it a very Interesting and a challenging problem because I am trying to make a model learn what even I cant learn that easily 3)Although I have worked on various Toy Datasets,I wanna use Deep Learning in projects that are more practical and present something new and creative. 4)I love Deep Learning and this is one of the few Projects that I found related to it and I will enjoy working on it trying to make it more and more accurate. This is my first GSoC and I don't really know how it works so I will be needing Some help in writing Proposals and at the same time would like to request you to provide us with a "small datasset" so we all can start working on it and it will not only help strengthen our Proposals but also to get a sense of what we will be dealing with in case any of us is selected. Thanks

yosoyjay commented 5 years ago

Hi All, I've added a link to a repo in first steps with some sample data and links to tutorials for what I have in mind for this project. If you believe a different model type would be more appropriate, by all means explore that route.

pranavbudhwant commented 5 years ago

@yosoyjay

Thanks for sharing the sample dataset & resources to get started! I will start working on the sample dataset & the approach you've mentioned, along with a different take on the problem which I think may give some results as well. Here's the answers to the aforementioned questionnaire:

Experience with Python:

  1. Have been coding in Python for the past 2 years.
  2. Libraries used - TensorFlow, scikit-learn, Keras, OpenCV, matplotlib, numpy, pandas, to name a few.

Experience with machine learning:

  1. Have completed courses such as -
    • Machine Learning by Stanford University,
    • Deep Learning Specialization by deeplearning.ai,
    • Machine Learning A-Z by superdatascience on Udemy,
    • Deep Learning A-Z by superdatascience on Udemy
  2. Have mostly worked in Computer Vision & am comfortable in implementation & tweaking of standard ML, DL algorithms.
  3. Currently working on a paper based on conditional GANs.
  4. Experience with & comfortable with implementing, debugging, visualizing Deep Feed Forward Nets & Deep CNNs from scratch. Have also worked with time series data such as LSTM for music/text generation, speech recognition.
  5. A little experience with Digital Signal Processing concepts such as MFCCs, something that is widely used as features to represent a given signal.

Experience handling N-dimensional data:

Why am I interested in the project:

Previous internships:

ZER-0-NE commented 5 years ago

Hi @yosoyjay Regarding the task you mentioned on Github, I have created a small CNN model with the help of the links you listed. I achieved a validation accuracy of 81.82% and training accuracy of 47.83%. This does not improve after 3-4 epochs. I believe the validation accuracy is higher than training accuracy due to dropout of 40-50% which I've applied in the layers. I am yet to evaluate the model. Currently, I am looking into data augmentation techniques to improve my dataset further. Is there a way to contact you?

1998at commented 5 years ago

@yosoyjay Could you provide us with more data because I ran the data through a model and here is what I am getting:

`Epoch 0/24

train Loss: 0.6653 Acc: 0.6250 val Loss: 0.6385 Acc: 0.5000

Epoch 1/24

train Loss: 0.5128 Acc: 0.6875 val Loss: 0.2004 Acc: 0.8333

Epoch 2/24

train Loss: 0.2114 Acc: 0.9062 val Loss: 0.1228 Acc: 1.0000

Epoch 3/24

train Loss: 0.2647 Acc: 0.9062 val Loss: 0.0035 Acc: 1.0000

Epoch 4/24

train Loss: 0.1572 Acc: 0.9375 val Loss: 0.0067 Acc: 1.0000

Epoch 5/24

train Loss: 0.3381 Acc: 0.9062 val Loss: 0.0044 Acc: 1.0000

Epoch 6/24

train Loss: 0.3537 Acc: 0.9062 val Loss: 0.0105 Acc: 1.0000

Epoch 7/24

train Loss: 0.1214 Acc: 0.9375 val Loss: 0.0081 Acc: 1.0000

Epoch 8/24

train Loss: 0.3119 Acc: 0.9375 val Loss: 0.0081 Acc: 1.0000

Epoch 9/24

train Loss: 0.1369 Acc: 0.9375 val Loss: 0.0108 Acc: 1.0000

Epoch 10/24

train Loss: 0.0572 Acc: 1.0000 val Loss: 0.0108 Acc: 1.0000

`

Clearly the data is too less to determine How model is performing.Looking at trin and val loss it seems its not overfitting or Underfitting but data is Simply too less to Say for sure.I used 6 samples in validation set and 34 in training

yosoyjay commented 5 years ago

@pranavbudhwant Thanks for filing out the information! Please have a look at the data I posted and see if it's something that's of interest to you.

1998at commented 5 years ago

@yosoyjay Could you provide with some more samples in the data?

yosoyjay commented 5 years ago

@ZER-0-NE Nice progress and workaround given the limited number of samples. In your small network, I'd think carefully about what each layer is doing.

ESIP would like us to have communication in the open so as not give applicants an advantage over one another. However, I'd suggest opening an issue at OrcaCNN-data repo and we can discuss in more detail there.

yosoyjay commented 5 years ago

@at1998 Yes, the limited data we have available now isn't ideal, but it's just a sample. It also makes you think carefully about model selection, or how you build your model.

yosoyjay commented 5 years ago

@at1998 Also, as I suggested to @ZER-0-NE, feel free to open an issue on the data repo so we can discuss specific issues there. Good luck : )

kunakl07 commented 5 years ago

Epoch 491/500 26/26 [==============================] - 0s 857us/step - loss: 1.2563e-06 - acc: 1.0000 - val_loss: 1.1681 - val_acc: 0.8889 Epoch 492/500 26/26 [==============================] - 0s 613us/step - loss: 2.2034e-04 - acc: 1.0000 - val_loss: 1.1974 - val_acc: 0.8889 Epoch 493/500 26/26 [==============================] - 0s 625us/step - loss: 5.4705e-05 - acc: 1.0000 - val_loss: 1.1694 - val_acc: 0.8889 Epoch 494/500 26/26 [==============================] - 0s 582us/step - loss: 4.6540e-05 - acc: 1.0000 - val_loss: 1.1694 - val_acc: 0.8889 Epoch 495/500 26/26 [==============================] - 0s 645us/step - loss: 1.5589e-07 - acc: 1.0000 - val_loss: 1.1693 - val_acc: 0.8889 Epoch 496/500 26/26 [==============================] - 0s 579us/step - loss: 2.2260e-06 - acc: 1.0000 - val_loss: 1.1692 - val_acc: 0.8889 Epoch 497/500 26/26 [==============================] - 0s 692us/step - loss: 3.6035e-05 - acc: 1.0000 - val_loss: 1.1712 - val_acc: 0.8889 Epoch 498/500 26/26 [==============================] - 0s 676us/step - loss: 2.5676e-07 - acc: 1.0000 - val_loss: 1.1712 - val_acc: 0.8889 Epoch 499/500 26/26 [==============================] - 0s 633us/step - loss: 3.3012e-07 - acc: 1.0000 - val_loss: 1.1711 - val_acc: 0.8889 Epoch 500/500 26/26 [==============================] - 0s 637us/step - loss: 4.4683e-06 - acc: 1.0000 - val_loss: 1.1716 - val_acc: 0.8889 <keras.callbacks.History at 0x7fb375c4f9b0> @yosoyjay Inspite of training it on a sample data,this model predicts with enough accuracy whether the long_samples you gave are positive or negative. This was just trained on CNN model. Now,I am going to try this dataset on FasterRCNN and see if there are any improvements Tommorrow,I will going for LSTM and GANS But,can I perform augmentation and add slight noises in order to increase the dataset ?as for FasterRCNN,LSTM and GANS,I am going to take datasets from Kaggle if the samples don't suffice and as it is in pure form Sir this is the image of my result bandicam 2019-03-06 10-01-44-861

yosoyjay commented 5 years ago

@kunakl07 That's great that it already recognizes that that there are calls in the long sample. A next step would be to think about a workflow, or pipeline, that identify which long segments have orca calls, then how do you recognize the calls within the long segments?

I'll be curious to see how the other model architectures perform as well.

kunakl07 commented 5 years ago

Yes sir,I have thought about that too.

Step 1: Count the number of calls in a single long samples. Step 2: Divide the segments(long sample) by the total number of calls Step 3: Find the call in each smaller segments,if there any.

ZER-0-NE commented 5 years ago

Here are the answers @yosoyjay

Experience with Python:

Experience with machine learning:

Experience handling N-dimensional data:

My interest in the project:

Previous internships:

ZER-0-NE commented 5 years ago

@yosoyjay Regarding the identification of calls in the long segments, I was thinking something along the lines of Binary Search on intervals, maybe dividing the whole segment into certain fixed size intervals and then finding the calls in each of these intervals.

kunakl07 commented 5 years ago

Hi @yosoyjay Now,we can divide large audio segments into smaller parts and our model would recognize the calls in the segments if there are any. I have shared the image of output. Here,initially in long_sample_02,there is no sound of Whale in 1st half and as we have divided the long sound wave into parts, the model has predicted the smaller part,that turns out to be negative. bandicam 2019-03-08 01-39-46-176

@ZER-0-NE you can use this code in your model to divide long samples into smaller parts.Here's the code

from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath('/content/OrcaCNN-data/data/long_samples')),'long_samples/long_sample_04.wav')

sound = AudioSegment.from_file(AUDIO_FILE)

halfway_point = len(sound) // 2
first_half = sound[:halfway_point]

first_half.export("/content/OrcaCNN-data/data/long_samples/ppart1_long_sample02.wav", format="wav")
yosoyjay commented 5 years ago

@kunakl07 @ZER-0-NE Yes, it's very likely that part of this project will involve taking all of the data and dividing it up into standard sized samples. I'm not sure if that was included in any of the links I posted, but that is a fairly standard part of the preprocessing acoustic data.

@kunakl07 The screenshot and code maybe helpful to folks, but are difficult to read. Could you please format the code using Markdown? That is, instead of: import numpy as np write: import numpy as np

kunakl07 commented 5 years ago

Hi @yosoyjay Now,we can divide large audio segments into smaller parts and our model would recognize the calls in the segments if there are any. I have shared the image of output. Here,initially in long_sample_02,there is no sound of Whale in 1st half and as we have divided the long sound wave into parts, the model has predicted the smaller part,that turns out to be negative. bandicam 2019-03-08 01-39-46-176

@ZER-0-NE you can use this code in your model to divide long samples into smaller parts.Here's the code

from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath('/content/OrcaCNN-data/data/long_samples')),'long_samples/long_sample_04.wav')

sound = AudioSegment.from_file(AUDIO_FILE)

halfway_point = len(sound) // 2
first_half = sound[:halfway_point]

first_half.export("/content/OrcaCNN-data/data/long_samples/ppart1_long_sample02.wav", format="wav")

This code is will generate sound wave of variable length as we are dividing the total length by half. But sir,for standard wave we would require a wave of fixed length. The following code will divide the ling segment and generate fixed sized segments of size 7 seconds.We can change the size as we want,this is just for an example.

from pydub import AudioSegment
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath('C:/OrcaCNN-data-master/data/long_samples/')),'long_samples/long_sample_01.wav')

sound = AudioSegment.from_file(AUDIO_FILE)

halfway_point = 7000
first_half = sound[:halfway_point]

first_half.export("C:/OrcaCNN-data-master/data/long_samples/popart1_long_sample001.wav", format="wav")
kunakl07 commented 5 years ago

And @yosoyjay ,sorry for the unclear image.This is the new image of division of long samples into smaller ones and predicting the smaller segments whether they contains calls. bandicam 2019-03-08 19-32-19-609 (2)

sainimohit23 commented 5 years ago

Hi @yosoyjay . I made a RNN model to get precise time(in seconds) of the orca calls. To trained the model, First I extracted 10 second background noise clips from provided data and then randomly placed positive and negative orca calls on those backgrounds using a python script. sample audio.

I created a dataset of 80 clips, just to see how model will preform.

Result for above sample audio: Screenshot_54

model detected two orca calls in above sample.

yosoyjay commented 5 years ago

@kunakl07 Yeah, I can guarantee that the audio clips will not be of equal length. Can you think of another efficient way to split the audio into equal sized chunks across all of the samples?

@sainimohit23 Nice use of synthetic data! Can you think of a method to evaluate the accuracy of the model?

kunakl07 commented 5 years ago

@yosoyjay This is the code for dividing various audio-segment of different size into fixed and equal size chunks.So,even if the clips are not of equal length,it would divide audio-segments into equal sized chunks.Here,I have taken an example of dividing audio clips each of 7 seconds and then we can use these equal sized clips to predict calls.

from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath('C:/OrcaCNN-data-master/data/long_samples/')),'long_samples/long_sample_01.wav')

sound = AudioSegment.from_file(AUDIO_FILE)

halfway_point = 7000
first_half = sound[:halfway_point]

first_half.export("C:/OrcaCNN-data-master/data/long_samples/popart1_long_sample001.wav", format="wav")
sainimohit23 commented 5 years ago

I trained it for around 1000 epochs. I didn't tuned the hyper parameters except learning rate.

Screenshot_55

ZER-0-NE commented 5 years ago

@yosoyjay Can the area under ROC curve be able to evaluate the model? Or F1-score?

yosoyjay commented 5 years ago

@sainimohit23 Can you describe the design of the model?

@ZER-0-NE Yeah, those metrics makes sense for the binary classification, or multi-class, scores. For @sainimohit23 I was asking how do you measure that you have the timing correct?

ZER-0-NE commented 5 years ago

Oh, I misunderstood. @yosoyjay This may not be the correct answer but I guess template matching and extraction does show the distinction between whale and non-whale call by finding the shape of whale sound in spectrogram. Though, the time of whale call might include some amount of noise, it can be reduced with some pre-processing steps like contrasting and using weiner filters. An example:

Screenshot from 2019-03-10 02-37-39

The coordinates of the rectangle can then be used to calculate the time of whale call. We can extend this to match multiple calls in a long segment.

A more accurate answer could be to see the spectrogram and examine it's frequency bins. We take a lower frequency bin of 40 Hz(<100 Hz where whale call does not exist) and a higher frequency call around 200Hz. As per my research, the whale calls do not go below 100 Hz or above 200-250 Hz. We can very well use this to find the time of whale calls in our acoustic data.

Screenshot from 2019-03-10 02-24-09

On plotting their frequency curve and comparing, we could see that the whale call has a very low variation than the one with non-whale call.

Screenshot from 2019-03-10 02-28-39

We can then calculate the centroid for each frequency bin, and I guess the centroid of the whale call would roughly correspond to the peak value in the frequency curve. The non-whale call would have the lower frequency signal as noise.

sainimohit23 commented 5 years ago

@yosoyjay I have sent you model details via mail. Right now I am looking for a way to use same model to get output for longer clips. I found some functions of pyaudio library which might help me in this.

kunakl07 commented 5 years ago

Hi @yosoyjay , After combining the sound of a ship's horn(that we can hear near the coast) and a long sample wave,I got the following audio result We can also perform various augmentations like adding noise,stretching and shifting the the sound file. I have given the code below so that @ZER-0-NE @sainimohit23 can increase your dataset and use it to make your model more accurate.

import librosa
import numpy as np
import matplotlib.pyplot as plt

class AudioAugmentation:
    def read_audio_file(self, file_path):
        input_length = 160000
        data = librosa.core.load(file_path)[0]
        if len(data) > input_length:
            data = data[:input_length]
        else:
            data = np.pad(data, (0, max(0, input_length - len(data))), "constant")
        return data

    def write_audio_file(self, file, data, sample_rate=16000):
        librosa.output.write_wav(file, data, sample_rate)

    def plot_time_series(self, data):
        fig = plt.figure(figsize=(8, 8))
        plt.title('Raw wave ')
        plt.ylabel('Amplitude')
        plt.plot(np.linspace(0, 1, len(data)), data)
        plt.show()

    def add_noise(self, data):
        noise = np.random.randn(len(data))
        data_noise = data + 0.005 * noise
        return data_noise

    def shift(self, data):
        return np.roll(data, 1600)

    def stretch(self, data, rate=1):
        input_length = 160000
        data = librosa.effects.time_stretch(data, rate)
        if len(data) > input_length:
            data = data[:input_length]
        else:
            data = np.pad(data, (0, max(0, input_length - len(data))), "constant")
        return data

aa = AudioAugmentation()

data = aa.read_audio_file("OrcaCNN-data/data/long_samples/long_sample_01.wav")
aa.plot_time_series(data)

data_noise = aa.add_noise(data)
aa.plot_time_series(data_noise)

data_roll = aa.shift(data)
aa.plot_time_series(data_roll)

data_stretch = aa.stretch(data, 0.8)
aa.plot_time_series(data_stretch)

aa.write_audio_file('OrcaCNN-data/data/long_samples/generated_lsw001.wav', data_noise)
aa.write_audio_file('OrcaCNN-data/data/long_samples/generated_lsw002.wav', data_roll)
aa.write_audio_file('OrcaCNN-data/data/long_samples/generated_lsw003.wav', data_stretch)

You can also generate graphs to know the difference between the real and augmented sound sample. This is the code for overlaying and merging different sounds.

from pydub import AudioSegment

sound1 = AudioSegment.from_file("OrcaCNN-data/data/long_samples/long_sample_01.wav")
sound2 = AudioSegment.from_file("boathorn.wav")

combined = sound1.overlay(sound2)

combined.export("OrcaCNN-data/data/mixed_001.wav", format='wav')

The graph of mixed_audio would look like this

bandicam 2019-03-11 13-57-40-177

While performing predictions on the mixed sound,the long sample was divided into equal chunks of 7 seconds each and the model ignored the ship-horn and only detected the calls in the mixed sample as positive(You can refer that code from previous comment). Here,the sound of horn is predicted negative. bandicam 2019-03-11 14-06-52-662 Here,we can hear the calls and also our model has predicted it to be positive. bandicam 2019-03-11 14-05-14-850 So,even after augmenting and merging other sounds this model is able to divide the into segments and predict them correctly

ZER-0-NE commented 5 years ago

@kunakl07 Thanks for the code. As mentioned earlier, You can quote for better readability of your code :) May I know at exacly what position did you "combine" the noise in your sample file? I guess @yosoyjay is looking for kind of an automated way to detect the start and end time of the whale call and their duration in the sample.

kunakl07 commented 5 years ago

@ZER-0-NE ,Here I've combined the overlay at 0:01.You can add it as per your wish.

ZER-0-NE commented 5 years ago

Why did you divide into equal chunks of 7 seconds? What if the audio sample were of much shorter length? (say 4s?) I just believe there can be a much more efficient way :)

yosoyjay commented 5 years ago

@ZER-0-NE Yeah, I totally think you are on the right track with your thinking about how to identify the times where a killer whales is very likely to be in a long sample. It would probably be worthwhile to explore both methods to evaluate the tradeoffs.

@sainimohit23 Thanks for the email, I'll have a look in detail there and reply.

@kunakl07 Thanks! I think these nice methods can be quite useful for folks wanting to create an augmented sample size. And, as @ZER-0-NE mentioned, it may be worthwhile to refactor this so that it can take the chunks size as an argument.

I didn't have an opportunity to create the mapping for the sounds/labels this weekend, I'll probably be able to create it tomorrow. Sorry for the delay!

kunakl07 commented 5 years ago

@ZER-0-NE You are free to take any size that you want,I've just taken 7 as an example,since,after augmenting,my audio clip was long.But,yeah still dividing into smaller parts would be efficient if the audio size is much smaller. @yosoyjay Yeah,sure.I would definately refactor the code.

yosoyjay commented 5 years ago

Hi all, I've added class labels to the training data. It's this file. These's labels refer to specific calls being made and do not refer to the pods because I don't have the labels for that right now.

I didn't include labels in the negative set because you can just treat those as 'negative' or an equivalent. It'd be interesting to see how much you can squeak out of the very limited data set here.

sainimohit23 commented 5 years ago

Hi @yosoyjay, I just want to tell you that I was able to create a program which can give precise time of orca calls in variable length input audio using the same model that I trained previously.

sample audio and it's results:

Screenshot_61

And you also asked me to give you test results. So, I created completely new set of 20 audio files and here are the results: Screenshot_59

kunakl07 commented 5 years ago

Hi @yosoyjay , My model correctly predicts the class of Ocra-calls,of long audio sample( if they are positive) by training the small audio samples. bandicam 2019-03-13 15-43-40-112

Steps performed by me,if any participant would like follow: 1)Here,I created 8 folders,one for each class,and then placed the respective training audio into their corresponding folders as per the format given in class label. 2)After training the small sample audios till 2000 Epochs,we would provide the long_sample that is to be predicted. a)This long sample is overlayed with some external noise. b)The long sample is further divided into 4 seconds chunk and is then given to the model for prediction The outcome is as follows bandicam 2019-03-13 15-41-53-639

The model correctly predicts 2 Ocra-calls in long samples 01 which starts at 4:00 and ends at 8:00. 8 out of 10 times my model has successfully able to classiy the class of the ocra calls correctly. This accuracy will still improve when, 1)we would increase the dataset by augment the training data. 2)Increasing feature_dimensions,number of epochs,and tuning hyperparameters would also improve tremendously the accuracy . bandicam 2019-03-13 15-45-10-164 @yosoyjay ,I would inform you by tomorrow,by how much the accuracy has improved once these factors are taken into account.

Ahmed-Moselhy commented 5 years ago

@yosoyjay I am very interested in contributing to this project. I have prior knowledge of working with acoustic data. What steps would be helpful to show my interest and how good of a fit I think I am for this project?

ZER-0-NE commented 5 years ago

Hi @Ahmed-Moselhy You can go through the first steps as mentioned in the first comment.That should probably help you get started.