Columbia-ICSL / PAWS-Smartphone

MIT License
0 stars 1 forks source link

questions: #1

Open keenblue opened 5 years ago

keenblue commented 5 years ago

如何收集自己感兴趣的声音数据? How to collect sound database of interest (such as the sound of boiling water, the sound of air conditioning) 如何使用收集的数据训练模型? How to use the collected database to train the model?

stephenxia commented 5 years ago

Good questions. It isn't clear how to do this from within the application or from the documentation. I will update the documentation and post a follow up to where you can find this information.

For detection there are two ways you can do the training: with sound files that you record and store on the phone, or you can record the sound files, extract features on a different machine, then use the generated features directly in the application. The exact details on how to do this will be in the documentation I am preparing.

stephenxia commented 5 years ago

I updated the Android applications and added a user manual inside each of the PAWS and PAWS Low Energy folders. The manual provides a basic tutorial on app usage and how to train your own detection classifier so that you can add examples like boiling water and air conditioning.

You can find more information in the README in the root folder: https://github.com/Columbia-ICSL/PAWS-Smartphone

The manuals themselves are found inside the "PAWS" (PAWS_Manual.pdf) and "PAWS_LE" (PAWSLE_Manual.pdf) folders: https://github.com/Columbia-ICSL/PAWS-Smartphone/tree/master/PAWS https://github.com/Columbia-ICSL/PAWS-Smartphone/tree/master/PAWS_LE

keenblue commented 5 years ago

https://github.com/Columbia-ICSL/PAWS-Smartphone/blob/master/PAWS_LE/Application/app/src/main/java/bashima/cs/unc/seus/activity/MainActivity.java at line 743 Object predictedClassValue = Constant.classifierDetection.classify(instance); in classifierDetection, the type of classifier is MFCC or NBIP?

Whether the MIC5 can be used alone to detect or locate the car without the headset?

Can the model already trained under SEUS folder in the project be used directly? we tried with dataset shared in community without headset, kind like we just play the sound clips near MIC5 to see if we can get expect result; However in MainActivity.java at line 744, it mostly will log out "predictedClassValue as 'car'" even we just make some noise around MIC5.

stephenxia commented 5 years ago

At line 743 it should be NBIP. In the code we sometimes use "genericCC" to mean the same thing as NBIP.

MIC5 is the mic on the phone. The car detection should run without the headset. The headset is only there for localizing car.

The model.ser in the SEUS folder is the model for detection. It is a model we trained using an Android Nexus 5, and should be usable. However, you may need to retrain or adjust the model using examples from your environment to get the detector to work as intended. A high false positive rate (detecting cars even when you just make random noises, as you said) is something we have had trouble with in the past too. We sort of solved this problem by retraining the models using different groups of examples that we recorded on our Nexus 5. You can train your own models using the method in the docs I provided in the last post.

Another problem might be scaling. I have seen that microphones from different phones produce different amplitude values. It is something we should have addressed by doing some kind of normalization; I will add + test it in the near future. You can also see if training the model using examples recorded from your own phone helps.

dundeeor commented 5 years ago

At line 743 it should be NBIP. In the code we sometimes use "genericCC" to mean the same thing as NBIP.

MIC5 is the mic on the phone. The car detection should run without the headset. The headset is only there for localizing car.

The model.ser in the SEUS folder is the model for detection. It is a model we trained using an Android Nexus 5, and should be usable. However, you may need to retrain or adjust the model using examples from your environment to get the detector to work as intended. A high false positive rate (detecting cars even when you just make random noises, as you said) is something we have had trouble with in the past too. We sort of solved this problem by retraining the models using different groups of examples that we recorded on our Nexus 5. You can train your own models using the method in the docs I provided in the last post.

Another problem might be scaling. I have seen that microphones from different phones produce different amplitude values. It is something we should have addressed by doing some kind of normalization; I will add + test it in the near future. You can also see if training the model using examples recorded from your own phone helps.

Hi Stephen, I looked through all the code and found that the MFCC classifier has been commented. Why you prefer use NBIP as the classifier?

stephenxia commented 5 years ago

Hi,

We empirically found that the NBIP features were able to better distinguish between non-car street sounds and car sounds, and we did a small optimization search to find the best parameters. More information can be found in our paper: https://ieeexplore.ieee.org/document/8366992

dundeeor commented 5 years ago

Hi,

We empirically found that the NBIP features were able to better distinguish between non-car street sounds and car sounds, and we did a small optimization search to find the best parameters. More information can be found in our paper: https://ieeexplore.ieee.org/document/8366992

Hi Stephen, I have read the paper, and I have two questions:

  1. The paper said:"As honk is not a noise like sound, we can not use our proposed feature NBIP in this scenario." But through the project code and you have mentioned aboved that NBIP was prefered. Why?

  2. How many simples have you record during the experiment to gain model to detect the car with much better result(only for car detection not consider car localization)?

What we have done is we record all the audio clips in our office which actually in a quiet environment. We recorded about 25 pure office environment audio clips and then we played a car horn audio clip from another phone in loop mode, and again we recorded 30 audio clips when the car horn clip played in office. Then we put it under Download/SEUS/Audio folder and retrained the model. However when we relaunch the APP and do the test, the result was still unacceptable, it will recognized as a car value even just knock the table. The result it apparently not accurate as the paper shows.

Could you please tell me more details about how to trainning a better model so we can reproduce it?

Thank you.

stephenxia commented 5 years ago
  1. MFCC was used for honk detection. In the source code in this repo, the honk detection is currently disabled.

  2. The models we trained generally had around 400 to 1000 examples of car and non-car examples each. For non-car sounds, we mainly got sounds in the streets near our office.

When you train/extract features from the files, I think the app should say something like "extracted 400 car examples, extracted 400 non-car examples". NOTE: One example is not necessarily one clip; we divide each clip into windows of around 200 ms.

If I understand your situation correctly, you trained a model office noise and car sounds, but no knocking noise. Does the app correctly not detect a car if the phone is sitting in the office, and does it correctly identify a car if you play a car sound? If so, then the classifier is fine.

It seems you are trying to see what the classifier would detect if you knock on the table, which is a sound you did not train for. Maybe you could try adding knocking noises into your dataset if that is the case and see how it performs.

dundeeor commented 5 years ago
  1. MFCC was used for honk detection. In the source code in this repo, the honk detection is currently disabled.
  2. The models we trained generally had around 400 to 1000 examples of car and non-car examples each. For non-car sounds, we mainly got sounds in the streets near our office.

When you train/extract features from the files, I think the app should say something like "extracted 400 car examples, extracted 400 non-car examples". NOTE: One example is not necessarily one clip; we divide each clip into windows of around 200 ms.

If I understand your situation correctly, you trained a model office noise and car sounds, but no knocking noise. Does the app correctly not detect a car if the phone is sitting in the office, and does it correctly identify a car if you play a car sound? If so, then the classifier is fine.

It seems you are trying to see what the classifier would detect if you knock on the table, which is a sound you did not train for. Maybe you could try adding knocking noises into your dataset if that is the case and see how it performs.

We've added a few of audio clips which contains knocking noise and the result is much better now!

And as you mentioned above that if we want a better model for car detection and we'd better add more complex environment noise to our database. This way seems worked for us but those are large amount of the sound which were belong to the environment noise and not easy to get it done.

And now the problem here is how to deal with the noise(not car horn) which not known by the model we have trained. Is there any wiser way to get it classified to non-use noise rather than we have to record the noise and retrain the model?

stephenxia commented 5 years ago

Great to hear.

Unfortunately we are not able to get good results unless we use a variety of different non-car sounds :(

In general, if it is a common sound that we will hear in our environment, we will include a few examples to build the model.

dundeeor commented 5 years ago

Hi Stephen,

There are two more questions now:

  1. You mentioned above that

    In general, if it is a common sound that we will hear in our environment, we will include a few examples to build the model.

    So normally how many or how long of the common sound will include to the dataset to get a better detection model?

  2. Now we are trying to build direction model and in the code that we need some kind of "carWeka.csv" file to make it work. So could you please tell me more detials about how to train direction model?

stephenxia commented 5 years ago
  1. The number/length of examples I would include for a particular sound should be roughly proportional to how often you expect to encounter the sound in your environment.

  2. You are correct, you need to build the carWeka.csv file with features for direction classification. I will make some updates to tell you how you can generate this file.

dundeeor commented 5 years ago

Hi Stephen:

We have prepared all the hardware ready, however we have no idea how to make it work, even do not know how to power it up. All of my colleagues are software engineer, we barely know nothing about hardware.

Could you please guide us to get all stuff combined and make it work? One more thing is we are still expecting the document about direction classification.

Thanks.

Joe

dundeeor commented 5 years ago

I got 2 pictures of the hardware, Please take a look: https://www.dropbox.com/sh/zazx44kk5iyjyvh/AADp-WGmDuH-cvYsd9jcp6vIa?dl=0

The link blow is the video we have recorded: https://www.dropbox.com/sh/l0mwbcb90ta5wdx/AACAHW7fXKiPhFmigUz_FcxCa?dl=0

stephenxia commented 5 years ago

Hi Joe,

Sorry for my late response. I am out of town until 12/8, and may not be able to help you out very much in this time period, but I will try my best.

  1. It looks like you were able to print the PCB and place all the components on the board. Were you able to program the two chips using the instructions found here? https://github.com/Columbia-ICSL/PAWS-FrontEnd/blob/master/PAWS/pcb/SEUS_embedded_front_Rev_0_1_documentation.pdf

Note: To program these chips, you will also need to obtain the development kits of the two chips (STM32f4 and Nordic nRF52)

  1. I got your video files of the trains passing. I will take a look at the videos/audio and try to figure out a good way to do the detection.

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen:

We were able to program the two chips using the development kits already. But we do not know how to combine all the hardware together to make whole system work. That's where we stucking now.

Could you please take some photos of your combined headset including all the wiring of front and back of pcb? So we can take a look at that and it would help us.

I look forward to hearing from you.

Regards, Joe

stephenxia commented 5 years ago

Hi Joe,

I will provide you the photos as soon as possible (it will probably be after I get back).

In the photo it seems you have shown one microphone (the small board). Do you have at least four of those boards? They need to be soldered onto the circular pcb (dont do it yet if you are planning to replicate our headset completely!)

Also is your plan to completely replicate our headset design? If so you would need to purchase the headset. We used something like this: https://www.amazon.com/IAXSEE-Headphones-Microphone-Lightweight-Smartphones/dp/B07CM2HCC7/ref=mp_s_a_1_21?ie=UTF8&qid=1544002424&sr=8-21&pi=AC_SX236_SY340_QL65&keywords=headphones%2Bcheap&dpPl=1&dpID=41GaXtO4YlL&ref=plSrch&th=1&psc=1

Also we power the system with 3 AAA batteries and we also power the microphones with a CR2032 coin cell battery. You can probably power the entire system with just the AAA batteries, but if you want to copy exactly what we did, you should probably purchase coin cell battery holders and AAA battery holders.

We actually had a difficult time fitting the AAA battery holder into the headphone case, so we 3D printed a smaller case; I do not recommend this because the battery connection is sometimes loose.

An overview of what you need to do is to solder 4 microphones onto the board. Solder battery connectors onto the board along with a power switch. I will write a document + pictures that goes into more detail on the exact steps.

stephenxia commented 5 years ago

Hi Joe,

I added a file that details how to assemble the headset. Take a look and let me know if you have any questions.

https://github.com/Columbia-ICSL/PAWS-FrontEnd/blob/master/PAWS/Headset_Assembly.pdf

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen,

We were able to power up the whole headset, but we can not detect the bluetooth through mobile phone.

It may occurred by some hardware failed on the PCB or the software we flushed into it or even the PCB we prepared was already broken. Is there any way to troubleshoot this problem?

And we also expecting the document about how to train the direction model.

stephenxia commented 5 years ago

Hi all,

I updated two things in the repository.

First, the PAWS project files and the models in the SEUS folder were modified to fix the problem of crashing due to missing files or incorrect vector dimensions.

Second, I updated the PAWS_Manual.pdf to include how to save and view features from the headset and the outputs of the detection + distance + direction classifiers. I also added steps on how to train the direction and distance classifiers.

Edit: All changes can be found here: https://github.com/Columbia-ICSL/PAWS-Smartphone/tree/master/PAWS

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen,

We have the repository updated to the latest commit and read through the PAWS manual then start train the direction and distance model.

We were able to train the two new models now with our data collected from SEUS FEATURE project or PAWS application. Up til now, we already tried it in two different scenes:

  1. We collected all the data at an office where is much quiter barely with noises. We played the pure car horn at 8 different location which maps the 8 quadrants. And the copyed all the data to form carWeka.csv and carDistWeka.csv, then put two files under SEUS for training the model. We also collected the audio for training the detection model. After all the model retrained and then we restart the application for reload the new models. However, when we try to play the car horn again at the same different locations that we used it for collect data to test the accuracy, well, it not as we expected, seems like the direction model result is random. The result of detection model seems much reasonable than before.

  2. This time we tried to gathering the data only from two direction: back and front of the headset at a long corridor. Then same as above we do all the direction and distance model regenerated. And the result of direction still meanless.

I put all the csv data in the dropbox and here is the link: https://www.dropbox.com/sh/biyd4p4yx6hoplp/AAAvi33GQiDtWEom1-C_VIqea?dl=0

Could you take a look those data and give us some advice?

Thank you, Joe

dundeeor commented 5 years ago

Hi Stephen:

We haven't got any message from you for a while so just let you know that we are still looking for your help.

I look forward to hearing from you.

Regards, Joe

stephenxia commented 5 years ago

I'll take a look at the app and the data you provided. Since you say that the direction looks meaningless, it could be something to do with the app.

stephenxia commented 5 years ago

Hi Joe,

I used the exact same application + direction model found in this repository and was able to get direction working well without having to do any modifications or retraining.

I also took a look at the features you provided. Below are the features you provided for the office space (carWeka.csv). Each color represents a different direction label (a through f). As you can see, we can't visually determine the boundaries between each class, and it is also very difficult for the model to learn around these features. features_plotted

Next, here is a plot of the features used to train the direction classifier found in this repository. Different colors represent different labels, and you can see that the boundaries between different directions are much more visible (the points also collect around a circle pattern). As such, it is possible for the model to learn this pattern. features_plotted_cu

This leads me to believe that you will have to recollect your data. You mentioned that you trained using a pure car honk. Was it a long continuous honk? If it was a short honk or multiple honks, then most of the training data may be background noise when the honk isn't playing, which would explain why all of your features are mixed together. I would suggest train with a continuous sound (like continuous white noise: https://mynoise.net/NoiseMachines/whiteNoiseGenerator.php) that is constant (e.g. constant amplitude and no spaces).

Let me know if that helps, Stephen

dundeeor commented 5 years ago

Hi Stephen,

Thanks for your help.

It is more clear to view the data through graph, niceidea, and it abviously that our data was wrong.

We were LOOP play a car horn and collect the data during the audio playing(the car horn audio link:https://www.dropbox.com/s/xt1ieulb8r0wda7/car%20horn.wav?dl=0) then we use python to scatter those data and all the features are mixed together.

Today we played the contiuous train horn audio(link:https://www.dropbox.com/s/3tw9bqcf5hxvm86/trainhorn2mins.wav?dl=0) about 2 mins length to collect the data again and the plot view seems still mixed together with all features.

Then we unpluged left and right mics, only keep front and back mic with power on and then we played contiuous car horn audio(https://www.dropbox.com/s/0r7hh2ppfa3xc5x/carhorn2mins.wav?dl=0) to collect the data, however the plot view still mixed. The cvs file(https://www.dropbox.com/s/4qeb0el95337lkc/carWeka7.csv?dl=0).

And below shows the scatter image: image

stephenxia commented 5 years ago

Can you check if the microphones are being powered? You can use a multimeter to make sure that the microphones are being powered by 3.3V, and if you use an oscilloscope to measure the data line and speak into the microphones, it should produce a voice-like signal. Another thing you can use is the SEUS Features Application (https://github.com/Columbia-ICSL/PAWS-Smartphone/tree/master/PAWS_Features/SEUS_Features) to see if the microphones are powered on and working correctly.

Also, if you turn off microphones, please try not to turn off the microphone circled in white. That is the reference microphone

screen shot 2019-01-31 at 1 04 14 pm
dundeeor commented 5 years ago

Hi Stephen,

Yes, we did tested the mics power and voice wave signal before we build up the whole headset system. We were sure all the mics works fine. And we also kept the microphone jump wire connect when we turn off mics.

Today we also tried it. Here is what we do:

First, we play the continous car horn audio(link:https://www.dropbox.com/s/0r7hh2ppfa3xc5x/carhorn2mins.wav?dl=0) very close to the front and back mics through a speaker so the voice is very loud. then we use the collected data to view in python plot. Below is the view: image

And the data for that(https://www.dropbox.com/s/k6b6smxezjvrlsc/carWeka11.csv?dl=0)

After we saw that we thought it may be work by this way(play close to the mics using loud audio) then we recollect the data in 8 quadrants. After regroup the data we have then we plot it.

Below is the 8 quadrants ccr file plot view: image

And here is the topview image

carWeka data link(https://www.dropbox.com/s/fe57d7r7km3bi17/carWeka13.csv?dl=0)

Then we also plot the power file using same method, here is the data file(https://www.dropbox.com/s/amn2147miicf2ht/power3.csv?dl=0). And the graph view:

image

It's seems much better than before but still not as well as the graph provided by you.

Can you describe about your environment when you collect the data for training the direction model?

stephenxia commented 5 years ago

The last plot looks kind of ok. Did you try using it?

We did the training in our lab space, which isn't too quiet (there are machines running from other groups). There are lots of tables and objects around. Ideally you would want to train in an area that is quiet with no echos and make sure that your sound source is the loudest sound in the environment.

The sound we trained with was white noise from this website: https://mynoise.net/NoiseMachines/whiteNoiseGenerator.php

We turned the first five knobs to the maximum.

We placed the sound source in the middle of each quadrant around 1 meter away from the system for training. We may have also moved the sound source in each quadrant during the training, but first try not to not move the sound source and see what kind of feature separation you can get.

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen,

We recollected the data using the white noise you provide however the plot view seems still not like a ring. image

We also use a continuous car horn to collect the data and the result of features were mixed together. image

The continuous car horn audio we used is here (https://www.dropbox.com/s/u84abvdencob6o3/ContinuousAudio.wav?dl=0), can you try to use this audio and collect the data for direction model and take a look of the result?

Thanks, Joe

stephenxia commented 5 years ago

I'm not sure what's going on. I tested the clip you sent me using the classifier I built with white noise, and I could localize the sound you sent me fairly well. I did not plot any of the features I was reading though, but I did notice a few more errors in direction estimation than just using white noise. Could be related to the frequency of the horn / our sampling rate.

However, I can't think of an obvious reason why your features look scattered, especially for the white noise case.

You showed a plot using two directions, which looked quite nice. What if you add a third direction; does it look like a triangle? Then add a fourth direction and see if it looks like a rectangle, etc.

dundeeor commented 5 years ago

Hi Stephen,

It looks like that we got part good result except at 3 positions near micphone 2.

image

The above picture shows the result of 5 direction except other 3 direction around mic 2, we played the white noise around the 5 position shows Blue Audio symbol in below image: image

Once we play the white noise at 3 position around number 2 micphone(shows Red Audio symbol in above image), those 3 result will mixed together like before.

We checked the micphone device and it works just fine.

Have you run to this situation before?

Thanks, Joe

stephenxia commented 5 years ago

Yes, we ran that experiment before. One of the earlier plots I posted showed we could see pretty good separation of the different directions, which is why the direction classifier could work.

One thing you could try doing is change the firmware on the hardware to, instead of computing and transmitting features, transmit the raw audio of a single microphone to the phone, save it, and listen to the audio to see if the microphone is actually recording the correct sound (e.g. if you speak while recording, are you able to identify that the audio is someone speaking). Repeat this for each microphone, and make sure to reduce the sampling rate from the microphones to ensure that you can transmit all of the data through BLE.

Also, what is your microphone placement? (e.g. are the microphones lined up in a circle of a certain radius)? How far apart are they separated?

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen,

We were placed the micphones in a circle of a certain radius exactly like the picture i post before, so the 4 micphones are at the position of LEFT, RIGHT, FRONT and BACK. And also they are in same horizontal level.

We will try the method you suggest today, however is there any way to replay the raw audio data we received from single micphone?

Thanks, Joe

stephenxia commented 5 years ago

Many software packages let you play sounds. I used MATLAB while debugging. I think MATLAB or python would be easiest to use.

Stephen

dundeeor commented 5 years ago

Hi Stephen,

We try to transmit the raw audio data and saved it to wav extension audio file with wav header. However when we play the audio it sounds just like the random tone, we can not hear any reverted noise that we used when we save the raw data.

It must be something wrong with raw data we transmited or the way we save the audio file. Can you give us more details about how to implement the way that we can transmit the raw audio data, save it to phone and play it back?

Thanks, Joe

dundeeor commented 5 years ago

Hi Stephen,

We are finally found out the reason of why one of the micphones data was not good.

Then we move on to the next stage, training the distance model. When we record all the data through SEUS FEATURE application and found that when the car moves more far away from the headset then the sound of engine and the noise between tyre and road getting much weaker so we can not get a good result unless we press the horn.

Could you advise us how to record the audio data for training the distance model(more detials please)?

And also we have some questions about the data for training direction model. Is it only need the data which we recorded around 1 meter away from the headset or is also need all the data in different distance like the distance model?

Thanks, Joe

stephenxia commented 5 years ago

Hi Joe,

Great to hear that the direction is working.

For distance, I would train at various distances away from the system. Additionally, try playing sounds at different directions from the system as well because the classifier in your version of the smartphone application uses relative power as a feature, which depends on the direction that the sound is coming from.

When the car is far away, it will be hard to hear the car. This is one of the reasons that the classifier should be able to classify whether or car is close or far (close if the energy is high and far if the energy is low). However, there could be other sounds in the environment that are loud. Since our features are energy-based features derived from raw audio, the classifier would not be able to detect that the car is far away in this case; the loud sounds in the environment would cause the distance classifier to detect that the car is close. This is a limitation of the current system. In other words, the distance classifier would only work in an environment in which the sound of the car is assumed and would be the loudest sound in the environment.

I would say if this assumption holds for your application scenario (which I think it would generally be true since trains are really loud), then you should train and test in scenarios where the vehicle is the loudest sound source.

Thanks, Stephen

dundeeor commented 5 years ago

Hi Stephen,

Last time when we were meeting through skype, you mentioned that there is an improved algorithm for detection, could you update that algorithm so we can test it?

And also as you said at last comment

This is a limitation of the current system. In other words, the distance classifier would only work in an environment in which the sound of the car is assumed and would be the loudest sound in the environment.

Is there any solution or optimization that if the detected audio voice which we are interesting in is not the loudest voice in the environment that we can get correct direction and distance classify result?

stephenxia commented 5 years ago
  1. The algorithm is not for detection. It was for direction estimation. The smartphone app is in the PAWS_LE folder, but the training procedure is different from the application you have been using. The detection algorithm is the same.

  2. For the second point, you could apply some sort of source separation algorithm, such as independent component analysis (ICA), non-negative matrix factorization (NMF), etc to extract only the sound you want (e.g. the car or train). In practice these source separation algorithms don't work really well (from my own experience), and are iterative and/or computationally expensive, so we were not able to incorporate these techniques into this platform. This is part of our ongoing work.

MustBeLittleR commented 5 years ago

Hi Stephen,

After testing in a real environment, the result of the car detection which matched the distance and orientation result only when the target sound is the loudest sound in the scenario. Is there any way to improve this by consider extracting the part sound which we are interested first and then calculating the audio features to verify the distance and direction? If this part of the calculation is expensive, maybe by move this part of the extraction work from the PCB board on to the mobile app to reduce the amount and time of calculation. How long will you guys take to porting this part of extracting calculation to the mobile app?