SundareshSankaran / SDG---SMOTE-Synthetic-Data-Generation

Upstream repository for a custom step to generate synthetic data based on an input table, using the Synthetic Minority Oversampling TEchnique (SMOTE). SMOTE is an oversampling technique which identifies new data observations in the neighborhood of closely associated original observations.
Apache License 2.0
1 stars 0 forks source link

Sample for assessment #39

Closed SundareshSankaran closed 1 day ago

SundareshSankaran commented 1 week ago

Given SMOTE's single-pass nature, when users trigger SMOTE with a given set of parameters, an assessment report based on a representative sample of original data compared to synthetic data provides an overview of how similar synthetic and original data are.

  1. Sampling percentage is adjustable
  2. User has an option to save report to PDF
SundareshSankaran commented 1 week ago
SundareshSankaran commented 3 days ago

((sampling_percent/100) * orig_records )/&synth_records)

SAS program extracted

SundareshSankaran commented 2 days ago

A New Output table will be specified. Optional

SundareshSankaran commented 1 day ago

Learning to be documented as they tend to be useful.