Detector data examples as built in data

ejulia17 commented 5 years ago

Dear Oscar,

It would be good to have sample objects created from the example data sets (currently in a directory as files) already built into the package. This is common in many R packages, for example see spatstat:

library(spatstat) data(lansing) plot(split(lansing))

In our example, the data sets are:

6 data sets for Perkin Elmer (2 samples for each of the 3 layout types)
1 for Excalibur (consisting of several raw data files)
1 for Pilatus
1 simulated data set for a hypothetical irregular detector (I would create that)

We just need to give these toy data sets some simple names that do not coincide with names used for objects or layout names in the R code.

Built in data is useful to attract new users to the package. They can immediately test the functionalities without having to go through the layout selection and raw data file downloading & loading into R phase.

Best wishes, Julia

OscartGiles commented 5 years ago

Just a note for me. Guide on adding data here: http://r-pkgs.had.co.nz/data.html#data

OscartGiles commented 5 years ago

Hi @ejulia17,

So it is relatively straight forward to do what you suggest above, I'm just thinking through what the data('name_of_dataset") function should actually load.

So at the moment you load a dataset like this:

layout_perkin <- create_module("PerkinElmerFull")

file_path <- "../tests/testthat/dead_pix/PerkinElmer/BadPixelMap_0.bpm/BadPixelMap.bpm.xml"

layout_perkin <- load_pix_matrix(layout = layout_perkin, file_path = file_path)

which creates a 'layout' and then loads the dataset into the layout.

So I guess it makes sense to have something like data('PerkinElmer') which loads the layout_perkin object with the pixel data pre-loaded into it?

Does that make sense? Any thoughts on this @tomaslaz ?

OscartGiles commented 5 years ago

Related to this is that we have the example datasets currently in the 'examples/' sub directory, and in the 'test/' directory which is unnecessarily increasing the size of the package.

ejulia17 commented 5 years ago

About your suggestion: "data('PerkinElmer') which loads the layout_perkin object with the pixel data pre-loaded into it?"

Yes sounds very good. This means they can skip the reading in raw data and work straight from the object, but still do anything to that has to do with statistical modelling of the data, including going from pixel to event level. So that is serves the goal of teaching the statistical functionalities of the package.

Please not note there are 6 different Perkin Elmer example data sets. They belong to 3 different layouts: full, cropped, refurbished, and there are more than 1 (in this case: 2 to be precise) different dead pixel sets for each of them (taken at 2 different time points). [Maybe a metaphor for this: think of layout as the graphical structure of a newspaper (where headlines, text, pictures go) and think of dead pixels as misprints.]

So we would end up with 6 different example data objects to be stored. Note sure how to name them, but something containing the below words maybe? PerkinElmer_Full_1 PerkinElmer_Full_2 PerkinElmer_Cropped_1 PerkinElmer_Cropped_2 PerkinElmer_Refurbished_1 PerkinElmer_Refurbished_2

For Pilatus we only have one data example.
For Excalibur as well we only have one example (though the raw data consists of several files).

Got to go back to the event... Will look at data locations later. (To speed this up maybe could you send direct links to where they are at the moment, there are several test folders...).

OscartGiles commented 5 years ago

If you go to the iss42 branch here: https://github.com/alan-turing-institute/DetectorChecker/tree/iss42

The data is in these folders:

examples/

tests/testthat/dead_pix/

OscartGiles commented 5 years ago

I don't think we have all the Perkin Elmer examples in either folder though. I can only see one file called BadPixelMap.bpm.xml.

@tomaslaz do you know if we have all the example datasets somewhere?

tomaslaz commented 5 years ago

detectorchecker/examples contains all the examples that we were given at the beginning of the project. tests/testthat/dead_pix contains only those examples that are used for testing.

We can easily add more examples to the repository, please ask Julia and Wilfrid to send us the data or add the examples by themselves.

We also need to think about how not to duplicate data detectorchecker/examples and tests/testthat/dead_pix. Would symbolic links work in this case?

OscartGiles commented 5 years ago

So I am placing the raw data in a folder called inst/extdata/ which is the standard way of including external data files (see http://r-pkgs.had.co.nz/data.html#data-extdata)

In the tests I then call the files like this:

test_path <- system.file("extdata", "PerkinElmer", "BadPixelMap_0.bpm", "BadPixelMap.bpm.xml", package = "detectorchecker")

OscartGiles commented 5 years ago

Need to add examples of how to load to vignettes

OscartGiles commented 5 years ago

Examples instructions added to readme

alan-turing-institute / DetectorChecker

Detector data examples as built in data #44