IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

onTheFlyFunction computational costs #252

Closed samgregory closed 2 years ago

samgregory commented 3 years ago

I have been experimenting with the onTheFlyFunction feature of get_trackdata - it is a very handy feature of emuR.

In the process I have discovered an issue. Depending on how many times a bundle appears in a seglist there will be repeated calls to the onTheFlyFunction that would appear to serve no purpose:

https://github.com/IPS-LMU/emuR/blob/4e1c44ef6341d7085c648b7812687c8384fe40e2/R/emuR-get_trackdata.R#L513-L516

The issue is that because there is no way to pass the seglist time information to the onTheFlyFunction you cannot constrain the calculation to within those boundaries. The function returns the calculation for the whole media file and then get_trackdata selects the appropriate data for the seglist.

Then, however, the calculation for the whole file isn't retained between loops so it has to be recalculated if there is another item in the seglist that is from the same bundle.

My suggestion is that you leave the onTheFlyFunction as it is - expecting a calculation for the whole media file to be returned as a tibble - but cache the results for each bundle so that they don't have to be recalculated if the seglist contains two or more items from one bundle.

MJochim commented 3 years ago

Thank you for the input. Unfortunately, the EMU-SDMS is currently out of funding.

We at the IPS will do what we can to fix bugs, security issues or necessary adjustments to new versions of R; but we cannot currently work on new features or performance improvements.

I think your suggestion makes sense, though. If anyone were to contribute a pull request, I’d be happy to review and merge it.

samgregory commented 3 years ago

Brilliant - the fix is pretty easy - I'll put a pull request in.