elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
635 stars 98 forks source link

Mirror pandas' to_csv lineterminator instead of line_terminator #595

Closed bartbroere closed 6 months ago

bartbroere commented 1 year ago

Let's mirror pandas' convention, even though it looks a little weird perhaps.

I believe the argument line_terminator is now ignored when it eventually ends up in the to_csv method of pandas as a kwarg, so I would classify this PR as a bugfix.

bartbroere commented 11 months ago

I think this one is safe to merge. @pquentin Can we run the test suite?

pquentin commented 11 months ago

buildkite test this please

pquentin commented 11 months ago

I'm sorry about the subpar contributing experience, and the lack of reviews. What worries me a bit here is that any code that has been using line_terminator is now going to fail.

@davidkyle Is that a concern?

davidkyle commented 11 months ago

The Pandas docs (v2.1.1) indicate the parameter should be lineterminator without the underscore

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

However Pandas 1.4 uses line_terminator

https://pandas.pydata.org/pandas-docs/version/1.4/reference/api/pandas.DataFrame.to_csv.html

So this is a fix and is required but let's merge it with the change to upgrade Pandas to 2.0 but not beforehand as it could break users with Pandas 1.4.

Also there is no need to rename the Eland parameter, we can just change the parameter sent to Pandas that way we don't break users of the Eland API.

davidkyle commented 11 months ago

@bartbroere echoing what @pquentin said thank you for all of your contributions and I'm also sorry the review process is not as fast as we would like. An extra thank you for you patience

bartbroere commented 11 months ago

@pquentin @davidkyle Thanks for the review, and no worries if it takes some time.

About the review:

Also there is no need to rename the Eland parameter, we can just change the parameter sent to Pandas that way we don't break users of the Eland API.

So if we don't change the parameter (but do change how it's passed to pandas), we could merge a part of this change now already, and maybe rename the parameter with a breaking release of eland? I'll push that commit now to see if you agree.

pquentin commented 11 months ago

buildkite test this please

davidkyle commented 9 months ago

buildkite test this please

davidkyle commented 9 months ago

buildkite test this please

davidkyle commented 9 months ago

@bartbroere I pushed a change adding the deprecation warning in 7290f7e9db411404de23ab0e4cbd81479bd8fa3d

pquentin commented 9 months ago

buildkite test this please

bartbroere commented 9 months ago

@bartbroere I pushed a change adding the deprecation warning in 7290f7e

Thanks! I was a bit hesitant to add this myself, since I can't really decide that.

pquentin commented 7 months ago

buildkite test this please