Open firefly-cpp opened 3 months ago
what does the squasching operation do? I found that arm-preprocessing
just calls https://github.com/firefly-cpp/NiaARM/blob/main/niaarm/preprocessing.py#L34
Can this be implemented as a FeatureTransformAlgorithm
?
""Data squashing is a preprocessing method that enables construction of smaller datasets from the original ones and provides approximately the same results of data analysis as the original."
I just revisited the ticket.
Based on my understanding of the method, it does neither fit into the category of feature_selection_algorithms
, nor feature_transform_algorithms
.
I think a cleaner option would be to introduce a sample_selection
or dataset_pruning
component class with possible implementations:
full
/ None -> use the whole datasetrandom(fraction)
-> use a random fraction of the datasquashing
(threshold) -> your proposed methodOptionally, one could also repurpose feature_transform_algorithms
into a general preprocessing
component class.
Either way, Given that most users probably work with rather small datasets (as larger ones are in my experience the exception) and the current run-times are acceptable, I think my time on this project is better spent on the other tickets.
Adding data squashing as a preprocessing method in the pipeline is also worth adding (probably useful).
It is already implemented here: https://github.com/firefly-cpp/arm-preprocessing