Open yongxuUSTC opened 6 years ago
Hey there, I want to emphasize the problem by reporting all the utterances in question ( Overall 36 ). This reduces the amount of overall usable development utterances to 477 ( from 488 )
Wav | Start | End | Label |
---|---|---|---|
-Qfk_Q2ctBs_30.000_40.000.wav | 0 | 0 | Train horn |
5-SzZotiaBU_30.000_40.000.wav | 0 | 0 | Car alarm |
-x70B12Mb-8_30.000_40.000.wav | 0 | 0 | Reversing beeps |
0E4AqW9dmdk_30.000_40.000.wav | 0 | 0 | Reversing beeps |
0OiPtV9sd_w_30.000_40.000.wav | 0 | 0 | Reversing beeps |
-F1_Gh78vJ0_30.000_40.000.wav | 0 | 0 | Bicycle |
-RFpDUZhN-g_13.000_23.000.wav | 0 | 0 | Bicycle |
-fhpkRyZL90_30.000_40.000.wav | 0 | 0 | Bicycle |
-ngrinYHF4c_30.000_40.000.wav | 0 | 0 | Bicycle |
--zLzL0sq3M_30.000_40.000.wav | 0.000 | 0.0 | Car |
--zLzL0sq3M_30.000_40.000.wav | 0.000 | 0.0 | Car passing by |
-6Yfati1N10_80.000_90.000.wav | 0.000 | 0.000 | Motorcycle |
-BGebo8V4XY_30.000_40.000.wav | 0.000 | 0.000 | Motorcycle |
-QMAKXzIGx4_10.000_20.000.wav | 0.000 | 0.000 | Motorcycle |
-S-5z2vYtxw_10.000_20.000.wav | 0.000 | 0.000 | Motorcycle |
-1X7kpLnOpM_60.000_70.000.wav | 0.000 | 0.000 | Train |
-1HlfoHZCEE_6.000_16.000.wav | 0.000 | 0.000 | Car |
-1McjOPUzbo_30.000_40.000.wav | 0.000 | 0.000 | Car |
-3929cmVE20_30.000_40.000.wav | 0.000 | 0.000 | Car |
-AF7wp3ezww_140.000_150.000.wav | 0.000 | 0.000 | Car |
-Pg4vVPs4bE_30.000_40.000.wav | 0.000 | 0.000 | Car |
-VULyMtKazE_0.000_7.000.wav | 0.000 | 0.000 | Car |
-cbYvBBXE6A_12.000_22.000.wav | 0.000 | 0.000 | Car |
06RreMb5qbE_0.000_10.000.wav | 0.000 | 0.000 | Car |
7NJ5TbNEIvA_250.000_260.000.wav | 0.000 | 0.000 | Car |
9fCibkUT_gQ_30.000_40.000.wav | 0.000 | 0.000 | Car |
-45cKZA7Jww_30.000_40.000.wav | 0.000 | 0.000 | Truck |
-4B435WQvag_20.000_30.000.wav | 0.000 | 0.000 | Truck |
-6qhtwdfGOA_23.000_33.000.wav | 0.000 | 0.000 | Truck |
That is to say,delete the samples which the utterance starts from zero and ends also at zero?
Why is there so many utterance without any label (totally empty)???
For example, the utterance starts from zero and ends also at zero, what does it mean? The problem is my trained model can always detect something, but the label is empty, leading to more insertion errors.
hVvtTC9AmNs_30.000_40.000.wav 0 0 Train