Closed bartbroere closed 10 months ago
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?
💚 CLA has been signed
@sethmlarson Would you (or a different maintainer) be willing to review this change?
Hello! Sorry for the lack of feedback. I'm going to help maintain Eland going forward, so feel free to ping me directly. I'll take a look at this next week.
buildkite test this please
buildkite test this please
I will merge from main and rerun tests when https://github.com/elastic/eland/pull/627 is merged.
buildkite test this please
In PR #450 @V1NAY8 started working on chunked CSV output, to solve issue #449
Since this is a feature I could really use, I continued the work that was started there, trying to work in some of the suggestions made in the other PR.
A lot has been discussed already in the other PR, but this should help with memory usage. Right now, to export to CSV, the entire Elastic index in the eland Dataframe will be converted to a pandas Dataframe. Only after that is
to_csv
called. This requires a lot of memory. After this PR, this will be done with multiple calls toto_csv
. After the first call, it starts using the append mode (mode="a"
). This should have a lower peak memory usage.In a bit I'll be testing it with a larger index, to see if these assumptions hold up and everything works as expected.