apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
380 stars 79 forks source link

Add DataFrame fill_nan/fill_null #922

Open timsaucer opened 1 month ago

timsaucer commented 1 month ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As a follow on to #919 there is a common operation in libraries such as pyspark to fill nulls in an entire DataFrame (or to limit by columns). It would be nice to have a similar feature in datafusion-python

Describe the solution you'd like

If I have a dataframe with a bunch of null values in different columns, I would want to replace all nulls in those columns with the provided value IF it can be cast to the column's type. Otherwise no-op should happen. Also the user should be able to limit which columns this applies to.

Describe alternatives you've considered

With #919 we can do this manually for one column at a time.

Additional context