Closed jeromebedard12 closed 3 years ago
Hello @jeromebedard12 !
I think it's a fair point that you bring. One should not be stuck with the default ids by using a splitter. I'm even wondering why do we have two separate functions for splitting (split_data_container and split). It seems to me we'd be better off having a single split function with the current signature of split_data_container. What do you think?
Also, is this modification needed urgently on your side? I'm a bit swamped at the moment and would rather perform that modification next week as long as it's not a major inconvenience on your side.
I think it's a fair point that you bring. One should not be stuck with the default ids by using a splitter. I'm even wondering why do we have two separate functions for splitting (split_data_container and split). It seems to me we'd be better off having a single split function with the current signature of split_data_container. What do you think?
Well I like the idea of split_data_container
being well defined and not abstract. It's just a method that reconstructs the data_containers properly using the splits from the split
method. I don't see how one would want this function to behave differently. It makes sense to me that the abstraction of the spliting logic is strictly nested inside the split
method and not too broadly in the split_data_container
method.
Also, is this modification needed urgently on your side? I'm a bit swamped at the moment and would rather perform that modification next week as long as it's not a major inconvenience on your side.
Next week is fine with me!
Closing this issue. Fix has been merge.
For: https://github.com/Neuraxio/Neuraxle/blob/3091bbc175be33b3b415386cf8a4dc79ce48d368/neuraxle/metaopt/auto_ml.py#L996-L1006
Suggestion:
For: https://github.com/Neuraxio/Neuraxle/blob/3091bbc175be33b3b415386cf8a4dc79ce48d368/neuraxle/metaopt/auto_ml.py#L972-L974
Suggestion:
For instance, if we end up with data_inputs 0, 1, 2 in train set and data_inputs 3, 4 in validation set, we might want to end up with: current_ids_train =
['0', '1', '2']
and current_ids_validation =['3', '4']
(or whatever current_ids the former data_container had) and not being forced to have: current_ids_train =['0', '1', '2']
and current_ids_validation =['0', '1']