axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.28k stars 1.26k forks source link

Implement a no-cleanup config opt #760

Open brettmilford opened 5 years ago

brettmilford commented 5 years ago

In testing a number scenarios it would be useful to disable the cleanup function some engines appear to run (e.g. rados). Is it possible to submit such a feature?

Glad to submit a PR myself with some guidance.

Thanks.

sitsofe commented 5 years ago

@brettmilford If you can afford the time give it a go! It might take some back and for but most things can be worked out...

sitsofe commented 5 years ago

Note: the whole "no-clean" idea may only be applicable to specific ioengines (e.g. I'm not quite sure how this idea works with libaio which just leaves the files it makes as is when it's done unless you set unlink. I suppose there's an argument that the rados ioengine isn't respecting that option...). If so, you may want to code this feature as an additional option for specific ioengine (see https://github.com/axboe/fio/blob/master/engines/rados.c#L66 for how a boolean was made and the lines leading to https://github.com/axboe/fio/blob/master/engines/rados.c#L317 for how it was retrieved).

sitsofe commented 5 years ago

OK I found you've already started https://github.com/brettmilford/fio-rados-tools/blob/master/nocleanup-patch/rpmbuild/SOURCES/fio-3.7-nocleanup.patch . We probably don't want to nop it out like that and still leave the existing code... Maybe you could make rados respect unlink option if an additional option is specified (we may have to hide the new behaviour behind a flag to avoid catching out existing rados ioengine users)?

axboe commented 5 years ago

This does sound like something that should just be covered with the unlink option. What specific parts of the cleanup/teardown do you want to avoid?

brettmilford commented 5 years ago

@sitsofe Yeah that was a quick hack I put together so that I could keep working on a problem; I wouldn't expect it an implementation to work that way.

@axboe I'm specifically wanting to avoid deleting objects after a job run with the rados engine so that I can observe other parts of the system whilst at that state. I figured that, although the implementation would be different across different engines the concept/option would be the same.

I see ioengines.h appears to define a cleanup param? https://github.com/axboe/fio/blob/master/ioengines.h#L36

And I've noticed a number of the engines (possibly all, I haven't checked) define a cleanup function e.g. https://github.com/axboe/fio/blob/master/engines/rados.c#L441 https://github.com/axboe/fio/blob/master/engines/sync.c#L428

However not all engines appear to call it, or at least call it in the same way as the rados engine does. Is there something I'm missing here? Is the cleanup function suppose to clean up files/objects generated or is it for something else? It appears, at least, that the rados engine is using this for deleting objects.

Thanks for your feedback.

sitsofe commented 5 years ago

Is there something I'm missing here? Is the cleanup function suppose to clean up files/objects generated or is it for something else? It appears, at least, that the rados engine is using this for deleting objects.

To my eye this is something that was specific to the rados engine. I'm guessing the engine author felt it was neater not to leave objects lying (perhaps because the engine always makes those objects and never uses ones that are lying around?) thus it grew different behaviour to all the other ioenignes...

axboe commented 5 years ago

I don't think it's any different than other engines setting up kernel/driver context. libaio is another example, so is io_uring. Fio always cleans up after itself.

I still don't understand the rationale for WHY you don't want this cleanup? That should be explained first. If that makes sense, then I'd be fine with adding a rados option that is no_cleanup, or something like that.

brettmilford commented 5 years ago

I'm specifically wanting to avoid deleting objects after a job run with the rados engine so that I can observe other parts of the system whilst at that state.

In this particular case I was testing bluestore OSD's and wanting to observe the DB size when the pool was 25% full, 50% ... , etc. Why? because there is very little documentation about how large the DB partition should be without data being offloaded to slow disk, a part from a rule of thumb mention of 4% of the OSD size which is generally untenable for most budgets (400G of NVME/10T drive), whilst others in the community have observed real-world usage of >0.1% of the DB partition.

Aside from the specifics of my use case its not unreasonable to imagine that a no clean-up opt would be useful in a variety of circumstances. Perhaps I'm using the wrong tool?

axboe commented 5 years ago

It's hard to say whether a no-cleanup style option is useful for other cases, for most IO engines it simply doesn't make any sense as retaining state is futile as it'll be killed by the kernel when the process exits anyway. For those, fio cleanup is simply a courtesy. For your case, it's a bit different, and it does sound like it makes sense to have a specific rados no cleanup option. As I've said previously, I would not mind taking such a patch. But it should be documented and specific to rados. On the off chance that it would make sense to other IO engines, they too can add an identically named option for the same purpose. At least that would ensure consistency.

sitsofe commented 3 years ago

@brettmilford oh wow you made a fix for this... could you submit it as a pull request?