The Automunge python library is intended as a resource for automatically preparing tabular data for machine learning by way of numeric normalizations, categoric binarizations, and missing data infill. When training data is prepared in the automunge(.) function, a compact python dictionary is returned recording steps and parameters of transformations, which may then serve as a key for preparing additional corresponding data on a consistent basis in the postmunge(.) function, as may include the preparation of streams of data for inference. In addition to preparing data under automation, Automunge may also be applied as a platform for specifying univariate transformations fit to training data properties. The library has a unique API, which includes a set of family tree primitives for simple command line specification of transformation sets that may include generations and branches of derivations. Missing data is automatically imputed by training feature set specific models to infer imputations from properties of the surrounding features, what we call ML infill.
Stochastic perturbations refers to the practice of injecting noise into features of data passed to inference, which translates inference to a non-deterministic outcome and may have relevance to fairness considerations, adversarial example protection, or other use cases benefiting from non-determinism. We offer the Automunge library for tabular preprocessing as a resource for the practice, which includes options to integrate random sampling or entropy seeding with the support of quantum circuits for an improved randomness profile in comparison to pseudo random number generators.
Thank you for your submission! There's still time to populate your submission with code, presentation material, etc. Please make any final adjustments before the deadline tonight at 17h00 EST!
Team Name:
Automunge (Registered to QHack as Automunger)
Project Description:
The Automunge python library is intended as a resource for automatically preparing tabular data for machine learning by way of numeric normalizations, categoric binarizations, and missing data infill. When training data is prepared in the automunge(.) function, a compact python dictionary is returned recording steps and parameters of transformations, which may then serve as a key for preparing additional corresponding data on a consistent basis in the postmunge(.) function, as may include the preparation of streams of data for inference. In addition to preparing data under automation, Automunge may also be applied as a platform for specifying univariate transformations fit to training data properties. The library has a unique API, which includes a set of family tree primitives for simple command line specification of transformation sets that may include generations and branches of derivations. Missing data is automatically imputed by training feature set specific models to infer imputations from properties of the surrounding features, what we call ML infill.
Stochastic perturbations refers to the practice of injecting noise into features of data passed to inference, which translates inference to a non-deterministic outcome and may have relevance to fairness considerations, adversarial example protection, or other use cases benefiting from non-determinism. We offer the Automunge library for tabular preprocessing as a resource for the practice, which includes options to integrate random sampling or entropy seeding with the support of quantum circuits for an improved randomness profile in comparison to pseudo random number generators.
Presentation:
Founder video: https://youtu.be/Sz8-gEZ1xL4
Preprint: https://arxiv.org/abs/2202.09248
Source code:
GitHub: https://github.com/Automunge/AutoMunge
Which challenges/prizes would you like to submit your project for?
Amazon Braket Challenge Google Quantum AI Research Challenge IBM Qiskit Challenge Quantum Finance Challenge Quantum Entrepreneurship Challenge