Open jokokojote opened 12 months ago
Hi @jokokojote ! This happens because we serialize the Pandas DataFrames to CSV, so we lose type info. We will try to find a solution for it
Hi @eugen-ajechiloae-clearml, thank you for your fast reply. I thought about this reason as well, but I was irritated that it worked as expected, when adding a second dummy return value (as explained in section Further observations). So it seems to me serialization is not run always, isn't it?
Btw: Could you imagine any disadvantages of using this "hack" for now in my projects until you tackle this topic?
PS: I would suggest adding a note about this to the pipeline docs, because it is not made clear how exactlyPipelineDecorator.debug_pipeline
behaves differently compared to PipelineDecorator.debug_pipeline
, e.g. wrt. the serialisation of data frames.
@jokokojote
So it seems to me serialization is not run always, isn't it?
We use different serialization techniques based on the data type. So when you returned a tuple, we no longer used the CSV serialization.
Btw: Could you imagine any disadvantages of using this "hack" for now in my projects until you tackle this topic?
No, it should not have any disadvantage when it comes to functionality
Describe the bug
When returning a pandas dataframe from a pipeline component columns of type
list
ornumpy.ndarray
change their type tostr
. This occurs when running the pipeline withPipelineDecorator.run_locally()
, but not when usingPipelineDecorator.debug_pipeline()
.To reproduce
See this minimal example code:
Further observations
Interestingly the problem is gone, when adding an additional return value to the pipeline component function and this value IS NOT added to
return_values
in the decorator :Expected behaviour
Columns of returned dataframes should keep their type between pipeline steps.
Environment