Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
608 stars 62 forks source link

include_incomplete_batch default in a QueuedPipeline doesn't match the docs #469

Closed hexa00 closed 3 years ago

hexa00 commented 3 years ago

From the docs : https://www.neuraxle.org/stable/api/neuraxle.distributed.streaming.html?highlight=sequentialqueuedpipeline#neuraxle.distributed.streaming.BaseQueuedPipeline

        include_incomplete_batch – (Optional.) A bool representing
whether the last batch should be dropped in the case it has fewer than batch_size elements; the default behavior is not to drop the smaller batch. :param default_value_data_inputs: expected_outputs default fill value for padding and values outside iteration range, or AbsentValuesNullObject to trim absent values from the batch :param default_value_expected_outputs: expected_outputs default fill value for padding and values outside iteration range, or AbsentValuesNullObject to trim absent values from the batch :param cache_folder: cache_folder if its at the root of the pipeline

The default should be not to drop the last batch, however in the code the default value for this param is False

Thus when executing:

    def get_n_batches(self, batch_size: int, include_incomplete_batch: bool = False) -> int:
        if include_incomplete_batch:
            return math.ceil(len(self.data_inputs) / batch_size)
        else:
            return math.floor(len(self.data_inputs) / batch_size)

The batch is in effect dropped

guillaume-chevalier commented 3 years ago

Wheter we fix this by inverting the bool or by changing the doc, I suggest that we rename the parameter as well to force users doing the update to be make them be sure that their code is ok. @vincent-antaki

vincent-antaki commented 3 years ago

Deal. I suggest we include the batch by default, as the documentation says. I kinda like the name like it is, but if we are to rename it i'd call it "keep_incomplete_batch".

I'll include the modification in PR #448 so that it's merged soon and part of 0.5.8