elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
1.91k stars 314 forks source link

Update create track #1846

Closed gareth-ellis closed 1 month ago

gareth-ellis commented 1 month ago

I seem to mess up my old branch - this is the same as PR #1836 Add option to increase batch size when creating a track to speed up download of data. Update track layout to adhere to best practices

Creating tracks from large corpus can take quite a bit of time, so I have added the option to increase the batch size of scan, so if a user is running with a stable enough network connection and enough hardware resources on the rally instance, they can increase the batch size to speed up download.

I have also updated the track layout to better match how we tend to layout our tracks

There was feedback to add more docs, so that's done too

ebadyano commented 1 month ago

@gareth-ellis Thank you for updating the docs. LGTM.

One additional thing, since you are alreday updating the docs do you mind adding an option for data streams? During last review when I tested the change with datastreams I didn't realize I needed to specify --data-streams to actually make track create to include them in the track. See https://github.com/elastic/rally/pull/1531/files Nevermind I somehow missed that it was in the docs already I will test it shortly to confirm it works

gareth-ellis commented 1 month ago

The data-stream option is already in the docs - https://esrally.readthedocs.io/en/stable/command_line_reference.html#data-streams - or were you meaning something different?

ebadyano commented 1 month ago

The data-stream option is already in the docs - https://esrally.readthedocs.io/en/stable/command_line_reference.html#data-streams - or were you meaning something different?

You are right, I somehow missed it.

gareth-ellis commented 1 month ago

I'd like to add the option to setup writing back to the datastream too - that isn't currently in place, but I thought that could be in a seperate PR