`DataCombined`: signature for `addSimulationResults` and `addDataSets`

PavelBal commented 2 years ago

I am still not happy with the argument name newNames. For me as a user this does not make any sense, because the data sets I am adding are not in the DataCombined yet, so I am not changing the names, I am giving them. newNames implies for me that there are also oldNames, and I am renaming something.

I am aware that my personal opinion is not necessarily representative for the majority of the users, therefore I would like to hear the different oppinions.

DataCombined is an object that allows to group different data sets (simulated results, data loaded from excel, from pkml, or just manually created). The user can add simulated results or data loaded in DataSet objects and assign them to groups. The DataCombined can than be used to create different figures, where all data in one group will have similar aesthetics (e.g., grouping simulated results with an observed data set will result in one legend entry and the data being plotted in the same color).

Every data set (from simulated results or loaded observed data) within the DataCombined object must have a unique name. By default, when adding a simulation result, its output path becomes its name:

#Load and run simulation
sim <- loadSimulation(system.file("extdata", "Aciclovir.pkml", package = "ospsuite"))
simulationResults <- runSimulations(sim)

#Create and population DataCombined
outputPath <- "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"
aciclovirData <- DataCombined$new()
aciclovirData$addSimulationResults(simulationResults = simulationResults,
                                   quantitiesOrPaths = outputPath)

> aciclovirData$names
[1] "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"

When adding an observed data, the name is the name of the data set (every DataSet has a name, e.g. in this example, the data sets are exported from PK-Sim by saving observed data as PKML):

#Load observed data
obsData <- lapply(c("ObsDataAciclovir_1.pkml",
                    "ObsDataAciclovir_2.pkml",
                    "ObsDataAciclovir_3.pkml"),
                  \(x){loadDataSetFromPKML(getTestDataFilePath(x))})

aciclovirData$addDataSets(obsData)

> aciclovirData$names
[1] "Laskin 1982.Group C"                                                       "Laskin 1982.Group D"                                                      
[3] "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)" "Vergin 1995.Iv"

The user can set custom names of simulated or observed data upon adding. This can be especially useful when adding multiple simulated results with the same path (e.g. from different simulations). For this, the methods addSimulationResults and addDataSets have an additional argument:

> aciclovirData$addSimulationResults(simulationResults = simulationResults,
+                                    quantitiesOrPaths = outputPath,
+                                    newNames = "secondSimResult")

> aciclovirData$names
[1] "Laskin 1982.Group C"                                                       "Laskin 1982.Group D"                                                      
[3] "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)" "secondSimResult"                                                          
[5] "Vergin 1995.Iv"

Similarly for observed data:

> aciclovirData$addDataSets(obsData, newNames = c("ObsData1", "ObsData2", "ObsData3"))

> aciclovirData$names
[1] "Laskin 1982.Group C"                                                       "Laskin 1982.Group D"                                                      
[3] "ObsData1"                                                                  "ObsData2"                                                                 
[5] "ObsData3"                                                                  "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"
[7] "secondSimResult"                                                           "Vergin 1995.Iv"

@ Users: We are having a discussion about the naming of the argument - newNames vs names. Could you please share your oppinion, which naming is more intuitive?

@msevestre @Yuri05 @IndrajeetPatil @StephanSchaller @VanessaBa

VanessaBa commented 2 years ago

I would prefer names

msevestre commented 2 years ago

I love this. We create an API, we use it, we don't like it, we iterate, we make it better!

Looking at this, I would expect names as well as a parameter.

PavelBal commented 2 years ago

I love this. We create an API, we use it, we don't like it, we iterate, we make it better!

Well that's life, we should only try to use and iterate before releasing ;)

msevestre commented 2 years ago

Even better 👍

IndrajeetPatil commented 2 years ago

I am still not happy with the argument name newNames. For me as a user this does not make any sense, because the data sets I am adding are not in the DataCombined yet, so I am not changing the names, I am giving them. newNames implies for me that there are also oldNames, and I am renaming something.

This actually makes me wonder if we should completely remove this parameter from the $add*() method signatures. I can see how this might be confusing for the user.

Alternative approach:

We introduce a public method, whose sole purpose would be to rename datasets, thereby avoiding any possible confusion. It'd have the following signature:

$setNames(names) {...}

It would behave the same way as the $setGroups() method; i.e. users would need to supply a named list (names = list("oldName1" = "newName1", "oldName2" = "newName2"), etc.). This way:

we avoid the current lexical order-based renaming;
users don't need to specify NULLs for datasets that they don't want to be renamed.

What do you think?

msevestre commented 2 years ago

I like this. Similar to setGroups, easy to test, no weird NULL to pass around

we avoid the current lexical order-based renaming;

YES. This is not good. Sorry that I missed this in the PR.

IndrajeetPatil commented 2 years ago

And this principle also has a name! 😄 https://en.wikipedia.org/wiki/Single-responsibility_principle

Currently, add*() functions are violating this principle by both adding datasets and renaming them.

msevestre commented 2 years ago

beware of what you are opening @IndrajeetPatil ... DataCombined is probably not following SRP... just saying

IndrajeetPatil commented 2 years ago

Haha. Sure, not at class level, but we can at least try to do so at the function/method level.

PavelBal commented 2 years ago

Against. Adding same path from different simulations (which is a very common use case) would become more complex.

simResults1 <- runSimulation(sim1)
simResults2 <- runSimulation(sim2)

outputPath <- "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"
dc <- DataCombined$new()

dc$addSimulationResults(simResults1, outputPath, names <- "Aciclovir sim1")
dc$addSimulationResults(simResults2, outputPath, names <- "Aciclovir sim2")

will become

dc$addSimulationResults(simResults1, outputPath)
dc$setNames(outputPath = "Aciclovir sim1")
dc$addSimulationResults(simResults2, outputPath, names <- "Aciclovir sim2")
dc$setNames(outputPath = "Aciclovir sim2")

Why? So no, definitely against.

This actually makes me wonder if we should completely remove this parameter from the $add*() method signatures. I can see how this might be confusing for the user.

So far we only heard that the argument name newNames might be confusing for the user, but not the general concept of providing names.

msevestre commented 2 years ago

oh yes. If you can have the same output path with different names this won't work

but this I don't understand

we avoid the current lexical order-based renaming; users don't need to specify NULLs for datasets that they don't want to be renamed.

IndrajeetPatil commented 2 years ago

lexical order-based renaming

We shouldn't worry too much about this. It's virtually impossible for this renaming to go wrong. I know because I had tried my best to break it when you had made a similar observation about the method to create groups, which was also based on lexical order but was much more fragile and was in dire need of a change.

PavelBal commented 2 years ago

Me neither. The issue is only about what naming is more intuitive from the users perspective and we have now 1 for newNames and 3 for names. Waiting for @Yuri05 and @StephanSchaller to have an absolute majority or to see that we need additional voters ;)

IndrajeetPatil commented 2 years ago

users don't need to specify NULLs for datasets that they don't want to be renamed.

If there are 3 different datasets, and you want to rename one of them, the current syntax is: newNames = list("newName1", NULL, NULL)

But a better syntax would be: newNames = list("oldName1" = "newName1")

Do you understand the issue now?

IndrajeetPatil commented 2 years ago

we have now 1 for newNames

Yes, I will always be team non-generic and verbose name! But then I am also not a user, so you should my weight my vote accordingly 😉

IndrajeetPatil commented 2 years ago

Are we also renaming forNames parameter to names for $setDataTransformations() method? Or is this confusion restricted only to the $add*() methods?

PavelBal commented 2 years ago

I would also rename, but not a too strong opinion on that.

Open-Systems-Pharmacology / OSPSuite-R

`DataCombined`: signature for `addSimulationResults` and `addDataSets` #824