Closed ejulia17 closed 5 years ago
Just a note for me. Guide on adding data here: http://r-pkgs.had.co.nz/data.html#data
Hi @ejulia17,
So it is relatively straight forward to do what you suggest above, I'm just thinking through what the data('name_of_dataset")
function should actually load.
So at the moment you load a dataset like this:
layout_perkin <- create_module("PerkinElmerFull")
file_path <- "../tests/testthat/dead_pix/PerkinElmer/BadPixelMap_0.bpm/BadPixelMap.bpm.xml"
layout_perkin <- load_pix_matrix(layout = layout_perkin, file_path = file_path)
which creates a 'layout' and then loads the dataset into the layout.
So I guess it makes sense to have something like data('PerkinElmer')
which loads the layout_perkin
object with the pixel data pre-loaded into it?
Does that make sense? Any thoughts on this @tomaslaz ?
Related to this is that we have the example datasets currently in the 'examples/' sub directory, and in the 'test/' directory which is unnecessarily increasing the size of the package.
About your suggestion: "data('PerkinElmer') which loads the layout_perkin object with the pixel data pre-loaded into it?"
Yes sounds very good. This means they can skip the reading in raw data and work straight from the object, but still do anything to that has to do with statistical modelling of the data, including going from pixel to event level. So that is serves the goal of teaching the statistical functionalities of the package.
Please not note there are 6 different Perkin Elmer example data sets. They belong to 3 different layouts: full, cropped, refurbished, and there are more than 1 (in this case: 2 to be precise) different dead pixel sets for each of them (taken at 2 different time points). [Maybe a metaphor for this: think of layout as the graphical structure of a newspaper (where headlines, text, pictures go) and think of dead pixels as misprints.]
So we would end up with 6 different example data objects to be stored. Note sure how to name them, but something containing the below words maybe? PerkinElmer_Full_1 PerkinElmer_Full_2 PerkinElmer_Cropped_1 PerkinElmer_Cropped_2 PerkinElmer_Refurbished_1 PerkinElmer_Refurbished_2
For Pilatus we only have one data example.
For Excalibur as well we only have one example (though the raw data consists of several files).
Got to go back to the event... Will look at data locations later. (To speed this up maybe could you send direct links to where they are at the moment, there are several test folders...).
If you go to the iss42 branch here: https://github.com/alan-turing-institute/DetectorChecker/tree/iss42
The data is in these folders:
examples/
tests/testthat/dead_pix/
I don't think we have all the Perkin Elmer examples in either folder though. I can only see one file called BadPixelMap.bpm.xml
.
@tomaslaz do you know if we have all the example datasets somewhere?
detectorchecker/examples
contains all the examples that we were given at the beginning of the project. tests/testthat/dead_pix
contains only those examples that are used for testing.
We can easily add more examples to the repository, please ask Julia and Wilfrid to send us the data or add the examples by themselves.
We also need to think about how not to duplicate data detectorchecker/examples
and tests/testthat/dead_pix
. Would symbolic links work in this case?
So I am placing the raw data in a folder called inst/extdata/
which is the standard way of including external data files (see http://r-pkgs.had.co.nz/data.html#data-extdata)
In the tests I then call the files like this:
test_path <- system.file("extdata", "PerkinElmer", "BadPixelMap_0.bpm", "BadPixelMap.bpm.xml", package = "detectorchecker")
Need to add examples of how to load to vignettes
Examples instructions added to readme
Dear Oscar,
It would be good to have sample objects created from the example data sets (currently in a directory as files) already built into the package. This is common in many R packages, for example see spatstat:
library(spatstat) data(lansing) plot(split(lansing))
In our example, the data sets are:
We just need to give these toy data sets some simple names that do not coincide with names used for objects or layout names in the R code.
Built in data is useful to attract new users to the package. They can immediately test the functionalities without having to go through the layout selection and raw data file downloading & loading into R phase.
Best wishes, Julia