dopefishh / pympi

A python module for processing ELAN and Praat annotation files
MIT License
93 stars 39 forks source link

Difference in number of removed annotations #26

Open Cogitarian opened 4 years ago

Cogitarian commented 4 years ago

I just wanted to remove empty annotations in one eaf. Surprisingly the method remove_annotation remove more annotations then ELAN does with TIER>REMOVE ANNOTATIONS>EMPTY ANNOTATIONS. I've tried removing rows in data.frame made of annotations from elan and it worked exactly as in ELAN. But still I'm not sure how the method remove_annotation works.

Try

#R
library(reticulate)
library(magrittr, lib.loc = "/Library/Frameworks/R.framework/Versions/3.6/Resources/library")
conda_list()[[1]][1] %>% 
  use_condaenv(required = TRUE)
#### PYTHON ####
# coding: utf-8
# -*- coding: utf-8 -*-
import codecs
import pympi    # Import pympi to work with elan files
import os, fnmatch
import glob
import json
import csv
import sys
import re
import numpy as np
import pandas as pd
setwd("/Volumes/MAXI RUGGED/Google Drive/2020UAM/INFORMATYKA/scRiPting/Py/!PYMPI!/TRANS2020")

eaffile02235 = "000-22-35-S1.mp3.audioenhance.eaf"
eaffile12235 = "001-22-35-S1.mp3.audioenhance.eaf"
eaf_file = pympi.Eaf(eaffile02235) 
eaf_tiers = eaf_file.get_tier_names()
eaf_tiers

t = 'COACH'
anotacje_COACH = eaf_file.get_annotation_data_for_tier(t)
len(anotacje_COACH)
eaf_file.to_file(eaffile02235)

for a in range(0,len(anotacje_COACH)):
  if len(anotacje_COACH[a][2])==0:
    eaf_file.remove_annotation(t,anotacje_COACH[a][0]+1,anotacje_COACH[a][1]-1)

anotacje_COACH = eaf_file.get_annotation_data_for_tier('COACH')
len(anotacje_COACH) #64

aupd = pd.DataFrame(anotacje_uczestnik)
aupu = filter(aupd,aupd[2]=='')
aupu = aupd[aupd[2].map(len) > 0]
aupu = aupu.to_records(index=False)
aupu = list(aupu)
len(aupu) # 181

eaf_file.remove_tier(t)
eaf_file.add_tier(t)

for a in range(0,len(aupu)):
  eaf_file.add_annotation(t,aupu[a][0],aupu[a][1], value= aupu[a][2])
eaf_file.to_file(eaffile02235)

000-22-35-S1.mp3.audioenhance.eaf.zip

dopefishh commented 4 years ago

pympi removes all annotations that have overlap with the given time. This overlap is inclusive, (<= and >= are used). Maybe ELAN uses exclusive overlaps (< and >)? If I have time I'll check it soon but feel free to put it to the test.