How to evaluate (using sed_eval toolbox) the devtest/evaltest files with no target events (no Onset/Offset time)

n33lkanth commented 3 years ago

Dear @toni-heittola @emrcak, I am stuck in the evaluation part of the Rare sound event detection task (DCASE 2017 Task 2 challenge). I can see that in all three dataset parts (devtrain, devtest, and evaltest) approx. 50% of files/samples are with no target sound events i.e., there is no Onset and Offset time. So, I am facing a problem in preparing the reference_event_list and estimated_event_list, which are required as input parameters for _sedeval toolbox. In official DCASE challenge page files with no detected event as also required in the following format: [filename (string)] If I Include this kind of file entry (with empty or no onset and offset) in reference_event_list and estimated_event_list then I am getting an empty slice error from the sed_eval toolbox. As a workaround, I am excluding such files during the training, validation, and testing phase but my score is pretty low. Do I need any post-processing to avoid such errors? Kindly, help me to understand the process to handle such a situation/condition.

Best Regards,

toni-heittola commented 3 years ago

This should work out-of-the-box with sed_eval. You can see an example of this in the unit tests of the sed_eval codebase: https://github.com/TUT-ARG/sed_eval/blob/master/tests/test_sound_event.py#L212 This code is verifying toolbox to work with files having no reference events. See also evaluated files: https://github.com/TUT-ARG/sed_eval/blob/master/tests/data/sound_event/binary5.txt https://github.com/TUT-ARG/sed_eval/blob/master/tests/data/sound_event/binary5_detected.txt

n33lkanth commented 3 years ago

Dear @toni-heittola , Thank you so much for sharing the resources above. I will check these examples and will get back to you if I have further doubt/questions. 👍

Best Regards

n33lkanth commented 3 years ago

Dear @toni-heittola , I compared the file system shared by you above with my implementation. I can see that all the test samples are included in both the example text files shared above. But, my implementation is based on your ICAASP2019 tutorial. In my validation/test phase I am evaluating one sample at a time (as explained in the tutorial). So, I am in a situation where for file 'mixture_devtrain_babycry_006_a9196cb1f180ff3c96339582025ec0b8.wav' (as an example) there is no event at all in the ground truth. So in the reference_event_list file, there is just the single entry of file name (no onset-offset and label info). But in the current_estimated file there are lots of false positives (as the file is from the first epoch only).

So, when I am passing these containers in the evaluator as shown below: evaluator.evaluate( reference_event_list=reference_event_list, estimated_event_list=current_estimated )

I am getting the following error message:

File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/dcase_util/containers/metadata.py", line 924, in append item = MetaDataItem(item) File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/dcase_util/containers/metadata.py", line 69, in __init__ self['event_label'] = self['event_label'].strip() AttributeError: 'NoneType' object has no attribute 'strip'

### Note: reference_event_list and current_estimated are not text file in actual but are Metadata containers initialized using 'dcase_util.containers.MetaDataContainer()'

If I try to evaluate all test files in a single go, then I am getting the following error message (Related files are uploaded as a zip file): estimated_event_list=current_estimated File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/sed_eval/sound_event.py", line 1262, in evaluate "reference_event_list contains events from multiple files. Evaluate only file by file." ValueError: reference_event_list contains events from multiple files. Evaluate only file by file.

I have attached the files for your reference. Kindly help me to understand if I am doing it wrong.

code_snippet.txt reference_event_list.txt current_estimated.txt Archive.zip

Thank you very much for your time Best Regards

toni-heittola commented 3 years ago

(This issue seems to be under wrong repository, as it is related more to sed_eval and icassp2019_tutorial)

In the ICASS2019 tutorial, evaluation is not done sample by sample. Results are collected from all files into one event list and same way events from the reference files are collected into one event list. After this these two event lists are given to the evaluator.evaluate method. This is the correct way to doing sound event evaluation as files in the evaluation do not contain equal amount of events.

Unfortunately, I cannot verify your evaluation code based on files you provided. Please provide actual data in the reference_even_list and current_estimated files (e.g. in csv format).

n33lkanth commented 3 years ago

Dear @toni-heittola , Thank you for the feedback.

I checked the ICASSP2019 Tutorial again and in validation phase I can see that the evaluation method is called inside the for loop (screenshot attached) where each validation samples are picked one by one evaluator.evaluate( reference_event_list=validation_item['meta'], estimated_event_list=current_estimated )
This is the place where I am getting errors.

I completely agree with you that during the evaluation/test phase all files prediction is accumulated (under the heading Going through all test material of ICASSP2019 Tutorial) and finally evaluated in a single go.

As mentioned in the previous post, If I try to evaluate all test files in a single go, then I am getting the following error message estimated_event_list=current_estimated File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/sed_eval/sound_event.py", line 1262, in evaluate "reference_event_list contains events from multiple files. Evaluate only file by file." ValueError: reference_event_list contains events from multiple files. Evaluate only file by file.

Screenshot 2021-08-04 at 7 56 52 PM

n33lkanth commented 3 years ago

Attaching the csv format files.

With these files also I am getting the error: reference_event_list=reference_event_list.filter(filename=filename), File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/dcase_util/containers/metadata.py", line 2122, in filter result = MetaDataContainer(super(MetaDataContainer, self).filter(**kwargs)) File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/dcase_util/containers/metadata.py", line 726, in __init__ self[item_id] = self.item_class(self[item_id]) File "/home/chaujp/.conda/envs/SED_LOCAL/lib/python3.7/site-packages/dcase_util/containers/metadata.py", line 69, in __init__ self['event_label'] = self['event_label'].strip() AttributeError: 'NoneType' object has no attribute 'strip'

Archive.zip

toni-heittola commented 3 years ago

I remembered wrongly usage of sed_eval evaluate method. Yes, it is indeed called file by file, and intermediate statistics are accumulated. Only once evaluation results are called from the evaluator class the actual metric values are calculated.

1) It seems that a bug related to empty event_labels was introduced to dcase_util at some stage. I fixed this now with commit https://github.com/DCASE-REPO/dcase_util/commit/76446d746423f8330f74c5d2968dbc043bb0a1cd. To get this working, you need to download dcase_util from sources from the 'develop' branch (https://github.com/DCASE-REPO/dcase_util/tree/develop) and install with pip install -e dcase_util (see https://dcase-repo.github.io/dcase_util/installation.html). This commit will be included next official version of dcase_util as well once it will be released.

2) The processing trick to get the files you provided to evaluate properly is to include to per file event list only active events. If there are no events then the list should be empty. See below the code to calculate event-based metrics for the files you provided:

There is also a slight mismatch between your filenames, the estimated system output is missing one "/" from the filenames.

import sed_eval
import dcase_util

reference_event_list = dcase_util.containers.MetaDataContainer(filename='reference_even_list.csv').load()
estimated_event_list = dcase_util.containers.MetaDataContainer(filename='current_estimated.csv').load()
event_based_metrics = sed_eval.sound_event.EventBasedMetrics(
    event_label_list=reference_event_list.unique_event_labels,
    t_collar=0.20
)

for filename in reference_event_list.unique_files:
    estimated_event_list_for_current_file = []
    for result_item in estimated_event_list.filter(filename='/'+filename):
        if 'event_label' in result_item and result_item.event_label:
            estimated_event_list_for_current_file.append(result_item)

    reference_event_list_for_current_file = []
    for meta_item in reference_event_list.filter(
            filename=filename
    ):
        if 'event_label' in meta_item and meta_item.event_label:
            reference_event_list_for_current_file.append(meta_item)

    event_based_metrics.evaluate(
        reference_event_list=reference_event_list_for_current_file,
        estimated_event_list=estimated_event_list_for_current_file
    )

print(event_based_metrics)

And the metric output:

Event based metrics (onset-offset)
========================================
  Evaluated length                  : 4765.69 sec
  Evaluated files                   : 147 
  Evaluate onset                    : True 
  Evaluate offset                   : True 
  T collar                          : 200.00 ms
  Offset (length)                   : 50.00 %

  Overall metrics (micro-average)
  ======================================
  F-measure
    F-measure (F1)                  : 0.09 %
    Precision                       : 0.05 %
    Recall                          : 2.70 %
  Error rate
    Error rate (ER)                 : 58.38 
    Substitution rate               : 0.00 
    Deletion rate                   : 0.97 
    Insertion rate                  : 57.41 

  Class-wise average metrics (macro-average)
  ======================================
  F-measure
    F-measure (F1)                  : 0.09 %
    Precision                       : 0.05 %
    Recall                          : 2.70 %
  Error rate
    Error rate (ER)                 : 58.38 
    Deletion rate                   : 0.97 
    Insertion rate                  : 57.41 

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    babycry      | 74      4250  | 0.1%     0.0%     2.7%   | 58.38    0.97     57.41  |

n33lkanth commented 3 years ago

Dear @toni-heittola , Thank you so much for taking out time and fixing this issue. It will save me a lot of time :) I will fix the '/' issue once I am able to install the _dcaseutil patch.

I will let you know if any issue.

Best Regards

n33lkanth commented 3 years ago

Dear @toni-heittola , After installing the _dcaseutil as suggested by you I am able to perform the evaluation. Thanks to you :-)

My Observation: However, I tried to avoid the for loop as implemented by you in the code snippet in previous post: for result_item in estimated_event_list.filter(filename='/'+filename): if 'event_label' in result_item and result_item.event_label: estimated_event_list_for_current_file.append(result_item) and tried to replace it with the filter function instead. But, doing so I encountered following error: Traceback (most recent call last): File "I:/metad/chaujp/workspace/python/Thesis/Multi-scale_sound_event_detection/source_code/xtra.py", line 210, in <module> estimated_event_list=estimated_event_list.**filter**(filename=filename) File "C:\Users\chaujp\.conda\envs\SED_LOCAL\lib\site-packages\sed_eval\sound_event.py", line 1292, in evaluate self.evaluated_length += reference_event_list.max_offset File "h:\workspace\python\project\sound_event_detection\source_code\dcase_util\dcase_util\containers\metadata.py", line 1133, in max_offset if 'offset' in item and item.offset > max_offset: TypeError: '>' not supported between instances of 'NoneType' and 'int'

So, I feel that the condition if 'event_label' in result_item and result_item.event_label: is important now.

Thank you very much for the support and debugging the files shared by me

Best Regards

TUT-ARG / DCASE2017-baseline-system

How to evaluate (using sed_eval toolbox) the devtest/evaltest files with no target events (no Onset/Offset time) #40