VEuPathDB / plot.data

1 stars 0 forks source link

plot.data

plot.data is an R package for creating client-ready data for various plots and visualizations. Data can be returned as either a data.table or a json file. The json file also includes some additional information helpful for rendering various plot widgets (ex: recommended range and step for a bin width slider to accompany a histogram).

Installation

Use the R package remotes to install plot.data. From the R command prompt:

remotes::install_github('VEuPathDB/plot.data')

# or to install a specific version
remotes::install_github('VEuPathDB/plot.data', 'v1.2.3')


Usage

All plot.data functions require at least the following arguments:

  1. A data frame or data table with columns corresponding to variables and rows to records (for example, observations, samples, etc.).
  2. A VariableMetadataList that associates columns in the data with plot elements, as well as passes information about each variable relevant for plotting. See veupathUtils for more details about the VariableMetadataList class.

    Example 1: Histogram

    
    # Data object is a data.table of raw values to bin and count
    df <- data.table('entity.xvar' = rnorm(100))

VariableMetadataList object

variables <- new("VariableMetadataList", new("VariableMetadata", variableClass = new("VariableClass", value = 'native'), variableSpec = new("VariableSpec", variableId = 'xvar', entityId = 'entity'), plotReference = new("PlotReference", value = 'xAxis'), dataType = new("DataType", value = 'NUMBER'), dataShape = new("DataShape", value = 'CONTINUOUS') ) )

Returns the name of a json file where histogram-ready plotting data can be found

histogram(data, variables, value='count', binWidth=NULL, binReportValue='binWidth', viewport=NULL)


### Example 2: Scatter with overlay
```R
# Example dataset
df <- data.table('entity.xvar' = rnorm(100),
                 'entity.yvar' = rnorm(100),
                 'entity.overlay' = sample(c('red','green','blue'), 100, replace=T))

# VariableMetadataList object
 variables <- new("VariableMetadataList",
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'xvar', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'xAxis'),
     dataType = new("DataType", value = 'NUMBER'),
     dataShape = new("DataShape", value = 'CONTINUOUS')
   ),
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'overlay', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'overlay'),
     dataType = new("DataType", value = 'STRING'),
     dataShape = new("DataShape", value = 'CATEGORICAL')
   ),
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'yvar', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'yAxis'),
     dataType = new("DataType", value = 'NUMBER'),
     dataShape = new("DataShape", value = 'CONTINUOUS')
   )
 )           

# Returns the name of a json file where scatterplot-ready plotting data can be found.
scattergl(df,
          variables,
          value='bestFitLineWithRaw')

Example 3: Box with one facet variable

# Example dataset
df <- data.table('entity.xvar' = sample(letters[1:5], 100, replace=T),
                 'entity.yvar' = rnorm(100),
                 'entity.overlay' = sample(c('facet1','facet2','facet3'), 100, replace=T))

# VariableMetadataList object
 variables <- new("VariableMetadataList",
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'xvar', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'xAxis'),
     dataType = new("DataType", value = 'STRING'),
     dataShape = new("DataShape", value = 'CATEGORICAL')
   ),
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'overlay', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'overlay'),
     dataType = new("DataType", value = 'STRING'),
     dataShape = new("DataShape", value = 'CATEGORICAL')
   ),
   new("VariableMetadata",
     variableClass = new("VariableClass", value = 'native'),
     variableSpec = new("VariableSpec", variableId = 'yvar', entityId = 'entity'),
     plotReference = new("PlotReference", value = 'yAxis'),
     dataType = new("DataType", value = 'NUMBER'),
     dataShape = new("DataShape", value = 'CONTINUOUS')
   )
 )

# Returns the name of a json file where boxplot-ready plotting data can be found.
box(df,
    variables,
    points='outliers',
    mean=F,
    computeStats=F)


Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Development

Before we begin, a few definitions:

Let's take the beeswarm plot as an illustrative example. Is a beeswarm a plot type distinct enough from both box and scatter to deserve its own class? The beeswarm is similar to box in that it is meant to show a distribution of a continuous variable split across a categorical variable. However, the beeswarm in itself does not require summary points such as median, quartiles, etc. Since a beeswarm maps samples to points, perhaps it should instead be an option in the scatter class? While true, note that the variable constraints for a beeswarm and a scatterplot differ: a beeswarm takes categorical variables on the independent axis while a scatterplot does not. Therefore, let's give the beeswarm its own class.

plot.data class files
Each plot.data class has a similar set up within their "class-plotdata-{plot name}.R" file:

Testing
This package uses the testthat package for testing. Each plotdata class should have a corresponding test context, i.e file called "test-{plot name}.R" in the tests/testthat directory. Tests written in this file should be basic unit tests, for example checking that the created object is of the appropriate class and size. See test-beeswarm.R for an example.

The tests should follow the below general organization:

  1. Check the returned object is of the appropriate size and shape.
  2. Test that types are as expected.
  3. Ensure a valid data.table is returned with expected dimensions, even when inputs are not ideal (ex. factors, numeric categorical variables).
  4. Validate the getJSON output structure.
  5. Test that missing data is handled appropriately.
  6. Vizualization-specific tests such as statistics.

Use devtools::test() to run all unit tests in this package. See devtools documentation for more details.

Helpers
Helper functions are organized into those that compute values per group (group.R), per panel (panel.R), handle binning (bin.R), or various other categories (see utils and utils-*.R). Using the beeswarm as an example, we can add groupMedian to group.R, which computes the median of the dataset per group (overlay, panel).

Exporting functions
Now that we've created a new plot, we'd like to use it! Add relevant functions to NAMESPACE with devtools::document(), so the new functions will get properly exported and can be used when someone loads plot.data.


License

Apache 2.0

Github Actions

R-CMD-check Codecov test coverage