It is an R package with tools designed for simulating data from data
sets in .parquet
format. It was primarily developed to help BCGov
researchers working in the DIP to create secondary datasets for use
during testing and development.
Developing code for data science applications can be time consuming and
testing code on massive data sets can slow down development
significantly. dipsim
helps by providing a way to quickly create
smaller versions of the actual data set.
To report bugs/issues/feature requests, please file an issue.
You can install the development version of dipsim from GitHub with:
# install.packages("devtools")
devtools::install_github("bcgov/dipsim")
This is a basic example of using dipsim
to simulate a data set of 50
rows, based on data transformed into parquet format. The data set
penguins
is found in the CRAN package, palmerpenguins
.
library(dipsim)
wd <- "/Users/brobert/Desktop"
##---------------------------------------- load routine --------------------------------------------------
parquet_fp <- search_parquet_data()
input_data <- make_input_data(support_fp = parquet_fp, resize = 100000, folder_location = wd)
##---------------------------------------- generate simulated data ---------------------------------------
simulated_data <- make_simulated_data (samp_size = 50, folder_location = wd, dataset_size = 1000
name = tools::file_path_sans_ext(basename(parquet_fp)))
##----------------------------------------- diagnostics --------------------------------------------------
cols <- compare_data(input_data, simulated_data)
vis_sim (input_data, simulated_data, cols)
##------------------------------------- clean up temp folder ---------------------------------------------
f=glue::glue("{wd}/{tools::file_path_sans_ext(basename(parquet_fp))}")
unlink(f, recursive = TRUE)
After installing the package you can view vignettes by typing
browseVignettes("dipsim")
in your R session.
If you would like to contribute, please see our CONTRIBUTING guidelines.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Copyright 2021 Province of British Columbia
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.