IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

Ability to supply your own "wrasspOutputInfos" to add_ssffTrackDefinition() #239

Closed FredrikKarlssonSpeech closed 3 years ago

FredrikKarlssonSpeech commented 3 years ago

The add_ssffTrackDefinition() is now artificially locked to being able to work only with functions in the wrassp package. I am not sure that is the intended consequence, but as the add_ssffTrackDefinition() checks that the wanted column actually will exist in the output by checking "wrasspOutputInfos" in the environment where add_ssffTrackDefinition() is defined rather than the calling environment, we strictly need to extend wrassp anytime when we want to extend the signal processing abilities of the system.

I propose a simple fix::

if the add_ssffTrackDefinition() function is revised to take an additional optional argument "wrasspOutputInfos" which may be defined as wrassp::wrasspOutputInfos per default, and this structure is used for checking we will have a non-breaking change that would allow for other signal processing routines that behaves like the wrassp functions to be applied across a database using add_ssffTrackDefinition() too.

raphywink commented 3 years ago

This is only really true for calculating SSFF files on-the-fly. If you wish to add arbitrary other data produced by any other signal processing routine (or captured data for that matter) to an emuDB this is already possible but it is not part of the "track definition" concept of an emuDB. I also think this is sort of conceptually sound as the EMU system only really "understands" SSFF files and AsspDataObjects and nothing else. If you want to access other types of files you'll have to write that functionality yourself using the onTheFlyFunction parameter of get_trackdata() where the manual says:

pass in a function pointer. This function will be called with the path to the current media file. It is required that the function returns a tibble/data.frame like object that contains a column called frame_time that specifies the time point of each row. get_trackdata will then extract the rows belonging to the current segment. This allows users to code their own function to be used with get_trackdata and allows for most data formats to be used within an emuDB.

An example function would be:

  myFun <- function(mediaFilePath){
    res = readr::read_tsv(file = paste0(tools::file_path_sans_ext(mediaFilePath), ".tsv"), # assumes these have a col. called "frame_time"
                          col_types = readr::cols() # suppress message
                          )
    return(res)
  }

And could be use as follows:


  sl = query(ae, "Phonetic == n")

  td = get_trackdata(ae,
                     sl,
                     onTheFlyFunction = myFun,
                     verbose = F) 
FredrikKarlssonSpeech commented 3 years ago

Great. But, it what is great with Emu is that it allows for checking a signal against the wave form and relate it to what is in transcriptions (and possibly correct the track). If you get data from functions on the fly, you lose the ability to take advantage of these great features. My data is formated in the SSFF signal format.

raphywink commented 3 years ago

Sorry I think I am misunderstanding you. Do you already have SSFF files? If so you can simply use the add_files() function to add them and then use add_ssffTrackDefinition() without calculating any files (as they will already be in the bundles and you only have to add an entry to the DBconfig which add_ssffTrackDefinition() does for you)

raphywink commented 3 years ago

closing this due to inactivity + I think the "relate it to what is in transcriptions" is actually covered by the frame_time extraction of the return value of the onTheFlyFunction parameter. Reopen if you feel otherwise