IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

Graceful handling and logging of get_trackdata errors #245

Open FredrikKarlssonSpeech opened 3 years ago

FredrikKarlssonSpeech commented 3 years ago

It would be nice to be able get data that can be used to fix a problem more easily than what is currently possible.

Using emuR 2.2.0 (and the previous version I had installed too) I get this error at the end of a rather large batch:

syllrms <- get_trackdata(gu,patakaCVSylls,"rms")

  INFO: parsing 34287 rms segments/events
  |======================================================================================================================================| 100%Error in (function (..., deparse.level = 1)  : 
  number of columns of matrices must match (see arg 18306)
In addition: There were 50 or more warnings (use warnings() to see the first 50)

There are very likely some strange segments in there causing the issue, but it would be nice if I could supply a name of a tibble where I would like segments that raised warnings to be stored so I could review them. Now I can only get the information using warnings(), and in this format then:

Warning messages: 1: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 209. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 2: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 210. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 3: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 301. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 4: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 302. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 5: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 362. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 6: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 363. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 7: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 520. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. 8: In get_trackdata(gu, patakaCVSylls, "rms") : The amount of data extracted doesn't match the expected segment length in segment list row 521. This can be caused by slight rounding errors in sample rates and start times. Adapting to extracted sample length. ......

If the function instead created a segment list, in the current environment or as a file (but not return it of course) then the work of fixing issues would be much simplified.

FredrikKarlssonSpeech commented 3 years ago

Related to this. I am using 2.2.0 and it is a bit frustrating to get this error message after a long extraction:


cvrms <- get_trackdata(gu,patakaC_V,"rms")

  INFO: parsing 34285 rms segments/events
  |=======================================================================================| 100%Error in (function (..., deparse.level = 1)  : 
  number of columns of matrices must match (see arg 18307)

(Generating the RMS track went fine though.

The user needs some information on what to do in this situation. I will try to get some information from the code, if possible, on what "18307" refers to (does not seem to be segment number), but a more graceful and informative error (message) I think is needed here.

FredrikKarlssonSpeech commented 3 years ago

Actually, I am now working from the hypothesis that my underlying issue is actually in rmsana somewhere. I can actually extract data for this segment list from a ksvF0 generated track, a zcrana track and also an intensity track generated from Praat. But, I cannot get the data from an rmsana track. The track generating part just works, and I get no error in the onTheFlyOptLogFilePath log. But, when I try to extract the data, the process just fails (like above) without an error. The same issue occurs when I set a cut point and just extract a couple of points too.

I am working from data I did not collect myself here so I am just without any information right now on where and why the processing fails. I guess there could be something odd with one or two wave files somewhere, but right now I cannot tell where. My best hypothesis right now is that rmsana returns something odd for a couple of my segments, and is very quiet about it.

raphywink commented 3 years ago

I'll def. look into improving the error messages. Writing out a log table file containing the bad segments might be a good way to go. Regarding the "number of columns of matrices must match" error: are all the segments longer than the window step size used in rmsana? What you could also check is if all SSFF files have the expected number of columns in the track that you are interessted in. Here would be an example checking if all .fms files have 4 columns in the fms track:

create_emuRdemoData("~/Desktop/")
db = load_emuDB("~/Desktop/emuR_demoData/ae_emuDB/")
fps = list_files(db, fileExtension = "fms")$absolute_file_path

for(fp in fps){
  ado = wrassp::read.AsspDataObj(fp)
  if(ncol(ado$fm) != 4){
    print("following file is bad: ", fp)
  }
}
raphywink commented 3 years ago

Could you maybe send me a mini emuDB with a single bundle that generates the error? That would help me a lot...

FredrikKarlssonSpeech commented 3 years ago

Hi Raphael,

Sorry for the delay. I missed your earlier question. I am trying to find the files that do not work, but it is puzzling and even the straightforward test code you supplied above fails in a strange way. I am on it and will get back to you when I get some idea on what happens here.