IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

problem adding SSFF files to bundles based on filename #196

Closed samkirkham closed 6 years ago

samkirkham commented 6 years ago

I have a dataset where there are EPG recordings (SSFF files) for some tokens but not for all tokens (I should note this is intentional). The add_files() documentation notes that "The files that are found in dir that have the extension fileExtension will be copied into the according bundle folder that have the same basename as the file." However, if I use the following code...

epg.dir <- "/path/to/dir/" add_files(adp, dir = epg.dir, fileExtension = 'ssf', targetSessionName = '0000')

...and ah_001_bndl is my first bundle folder and ah_013.ssf is my first EPG file then ah_013.ssf gets allocated to ah_001_bndl rather than ah_013_bndl. If I modify this to something like targetSessionName = 'epg_files' then it gives me this error:

Error in add_files(adp, dir = epg.dir, fileExtension = "ssf", targetSessionName = "epg_files") : more or less then one bundle found that matches the base name of the file '/path/to/dir//ah_013.ssf'

There is a single unique bundle corresponding to the base filename of every .ssf file. Is there anything I'm doing obviously wrong here or is add_files() not behaving as it should? Any help is appreciated, thanks!

raphywink commented 6 years ago

ok ... strange. I just tried the following:

# renamed a bundle to ah_001_bndl in the ae_emuDB
# created a ssff file called ah_013.ssf in a sep. folder
ae = load_emuDB("~/Desktop/emuR_demoData/ae_emuDB/")
list_bundles(ae)

#  session     name
#1    0000   ah_001
#2    0000 msajc003

epg.dir <- "~/Desktop/emuR_demoData/test/" # contains only ah_013.ssf
add_files(ae, dir = epg.dir, fileExtension = 'ssf', targetSessionName = '0000')
# Error in add_files(ae, dir = epg.dir, fileExtension = "ssf", targetSessionName = "0000") : 
#  more or less then one bundle found that matches the base name of the file '/Users/raphaelwinkelmann/Desktop/emuR_demoData/test//ah_013.ssf'

which is what you'd expect as there is no ah_013_bndl in session 0000. If I rename the ah_013.ssf to ah_001.ssf add_files() works without an error. I also just reviewed the add_files() code and I can't see where the ah_013_bndl vs. ah_001_bndl confusion could happen: https://github.com/IPS-LMU/emuR/blob/master/R/emuR-database.files.R#L192 (function is super simple)...

hmmmm...

can you maybe put together a minimal working example (incl. files) so I can reproduce the problem?

samkirkham commented 6 years ago

Thanks a lot for your reply. Here is some example code:

# load DB 
test <- load_emuDB("~/Desktop/test_data/test_emuDB")
list_bundles(test)
#  session   name
#1    0000 ah_002
#2    0000 ah_013

# add EPG (ssf) file (note: no ssf file exists for ah_002, ssf file exists for ah_013)
# issue: the below code places ah_013.ssf in ah_002_bndl instead of ah_013_bndl
epg.dir <- "~/Desktop/test_data/epg_files/"
add_files(test,
           dir = epg.dir, 
           fileExtension = 'ssf',
           targetSessionName = '0000')

And here is the data: test_data.zip

Thank you so much!

raphywink commented 6 years ago

perfect! Will check it out and get back to you.... could you maybe post the output of devtools::session_info() so I know what system/version you are running? Thanks ;-)

samkirkham commented 6 years ago

here you go, thanks!

Session info -----------------------------------------------
setting  value                       
 version  R version 3.4.1 (2017-06-30)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.153)           
 language (EN)                        
 collate  en_GB.UTF-8                 
 tz       <NA>                        
 date     2018-07-02  

Packages ---------------------------------------------------
 package    * version    date      
 assertthat   0.2.0      2017-04-11
 base       * 3.4.1      2017-07-07
 bindr        0.1.1      2018-03-13
 bindrcpp   * 0.2.2      2018-03-29
 bit          1.1-12     2014-04-09
 bit64        0.9-7      2017-05-08
 blob         1.1.0      2017-06-17
 broom        0.4.4      2018-03-29
 cellranger   1.1.0      2016-07-27
 cli          1.0.0      2017-11-05
 colorspace   1.3-2      2016-12-14
 compiler     3.4.1      2017-07-07
 crayon       1.3.4      2017-09-16
 datasets   * 3.4.1      2017-07-07
 DBI          0.7        2017-06-18
 devtools     1.13.4     2017-11-09
 digest       0.6.15     2018-01-28
 dplyr      * 0.7.5      2018-05-19
 emuR       * 0.2.3      2017-07-03
 forcats    * 0.2.0      2017-01-23
 foreign      0.8-69     2017-06-22
 ggplot2    * 2.2.1.9000 2018-06-27
 glue         1.2.0      2017-10-29
 graphics   * 3.4.1      2017-07-07
 grDevices  * 3.4.1      2017-07-07
 grid         3.4.1      2017-07-07
 gtable       0.2.0      2016-02-26
 haven        1.1.0      2017-07-09
 hms          0.4.0      2017-11-23
 httr         1.3.1      2017-08-20
 jsonlite     1.5        2017-06-01
 labeling     0.3        2014-08-23
 lattice      0.20-35    2017-03-25
 lazyeval     0.2.1      2017-10-29
 lubridate    1.7.1      2017-11-03
 magrittr     1.5        2014-11-22
 MASS         7.3-47     2017-02-26
 memoise      1.1.0      2017-04-21
 methods    * 3.4.1      2017-07-07
 mnormt       1.5-5      2016-10-15
 modelr       0.1.1      2017-07-24
 munsell      0.4.3      2016-02-13
 nlme         3.1-131    2017-02-06
 parallel     3.4.1      2017-07-07
 pillar       1.2.1      2018-02-27
 pkgconfig    2.0.1      2017-03-21
 plyr         1.8.4      2016-06-08
 psych        1.8.4      2018-05-06
 purrr      * 0.2.5      2018-05-29
 R6           2.2.2      2017-06-17
 Rcpp         0.12.17    2018-05-18
 readr      * 1.1.1      2017-05-16
 readxl       1.0.0      2017-04-18
 reshape2     1.4.3      2017-12-11
 rlang        0.2.0.9001 2018-06-27
 RSQLite      2.0        2017-06-19
 rstudioapi   0.7        2017-09-07
 rvest        0.3.2      2016-06-17
 scales       0.5.0.9000 2018-05-09
 stats      * 3.4.1      2017-07-07
 stringi      1.2.2      2018-05-02
 stringr    * 1.3.1      2018-05-10
 tibble     * 1.4.2      2018-01-22
 tidyr      * 0.8.1      2018-05-18
 tidyselect   0.2.4      2018-02-26
 tidyverse  * 1.2.1      2017-11-14
 tools        3.4.1      2017-07-07
 utf8         1.1.3      2018-01-03
 utils      * 3.4.1      2017-07-07
 uuid         0.1-2      2015-07-28
 withr        2.1.2      2018-06-27
 wrassp       0.1.4      2016-05-30
 xml2         1.1.1      2017-01-24
raphywink commented 6 years ago

Could you maybe update to the newest emuR version and see if the problem still persists? It could be that is has to do with the *added missing $ in pattern arguments in list.files call in list_files (fixes #170)** bug fix of version 1.0.0: https://github.com/IPS-LMU/emuR/blob/master/NEWS.md#bug-fixes-1

raphywink commented 6 years ago

was able to reproduce the problem... so it has nothing to do with the new version (upgrading won't hurt though ;-))

raphywink commented 6 years ago

Is fixed as of version 1.0.0.9010 (devtools::install_github("IPS-LMU/emuR") to install)... stupid little typo! Thanks for pointing this out!

samkirkham commented 6 years ago

Thanks - works perfectly!