IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

A single style of classes / objects is needed #179

Closed FredrikKarlssonSpeech closed 6 years ago

FredrikKarlssonSpeech commented 6 years ago

I'm a long time user of R (16 years now I think) and I use the language for all my research. Of course I've developed some code that I think is helpful for people like me, but I have opted not to release it on CRAN or anywhere else because I think that the field does not really yet another set of tools/package for phonetics in R to fragment work even more. We need a comprehensive framework, similar to what is being done in bioinformatics (bioconductor), but at a much smaller scale of course. Too many researchers are trying to build careers by creating their own tools, rather than try to cooperate so that more research is done.

I would like to help develop a nice and powerful package for speech database and data manipulation and presentation. I therefore forked emuR and aimed to go through the code to fix issues with missing or too complicated documentation that I knew was in the package. As a way of getting a feel for how everything is tied together and how you think the user should work with data within the package. I always had the feeling that some functions could refactored to better fit the S3 style of applying methods/functions to objects. I also have looked at which functions are exported out of the package, as the namespace of the package is really cluttered and difficult to understand.

After some time in the code I strongly feel that the code requires some setting of basic principles, especially related to signal data (trackdata in the old class structure). The way I see it, the old R code had trackdata and spectral S3 classes, and they were not as clearly separated in the thinking as one would have liked. And one could have used the fact that they were S3 classes more efficiently and reduced the clutter in the package. Adding a emuRtrackdata class also does not help the issue, and odd ways, in the R world, of manipulating objects created in the class (create_emuRtrackdata() for instance). For a user of R, create_emuRtrackdata and make.seglist and make.emuRsegs are really odd functions. If I create a generalized linear model (glm class) in R I call glm() with the components I need to create it, and I get a glm object. I do not call make_glm() and get the glm object. Likewise, I call as.matrix on any S3 object and assume that the author has implemented an as.matrix.glm if this is indeed a sensible thing to do.

Right now the situation regarding signal data is a bit of a mess (Sorry!). Please state what kinds of classes you have decided on for the emuR classes, and I'll try to help organizing the functionality accordingly. The legacy code had trackdata and spectral classes. What was the reasoning behind a new emuRtrackdata class? Please note that emuRtrackdata is a name that sticks out in R, at least for me, so perhaps it would be better to call it trackdata (or speech) and just remove the old implementation of it? The same goes for emuRsegs (but here emusegs was not any better so maybe the class should just be called 'segmentlist'?). What about 'spectral'?

I would love to help move the code into S4 classes if that is what the team leader wants (I'll just need to learn it), or to refactor the code so that it better fits S3 conventions. Please also tell me what the classes should be called. trackdata and spectral agrees better with R than emuRtrackdata IMHO. I would find it awkward to use a emuRtrackdata class constructor. Could I start trying to remove the old trackdata implementation and insert emuRtrackdata instead?

Sorry for the many questions.

FredrikKarlssonSpeech commented 6 years ago

Sorry. I did not have time to go into the code of 'emuRtrackdata' at the time when I felt that I had to react on issues of inconsistencies in what classes are and how methods interact with them. I see now that S3 is the standard and what emuRtrackdata aims to solve is really something slightly different than trackdata. The name threw me off to be honest, but I will open a different issue for this discussion.