DataDog / glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.
Other
2.93k stars 161 forks source link

feat: handle `DmaFile::read_at` called with a size larger than `max_sectors_size` #635

Closed ozgrakkurt closed 5 months ago

ozgrakkurt commented 6 months ago

What does this PR do?

This PR tries to prevent the case of executing DmaFile::read_at with a larger size than DmaFile::max_sectors_size. Because this causes the process to hang forever in D state in some kernels (e.g. 5.15)and just hang forever in interruptible state in other kernels. (e.g. 6.1)

Motivation

What inspired you to submit this pull request?

I have this case where I read pages of parquet files from disk in parallel and this problem causes my program to hang. I am able to mitigate this issue by just splitting my reads before calling read_at/read_many but this is still a problem since it took a lot of time to debug and it causes entire process to hang instead of just returning an error/splitting the reads internally. Imo there should be an error if read_at is called like this and read_many should split the big reads internally.

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant for this Pull Request.

633

Additional Notes

Anything else we should know when reviewing?

I want to also implement the internal splitting inside read_many if it makes sense. Also can implement internal splitting for read_at as well if error doesn't make sense. Also want to do the same stuff for write functions, I am guessing the same thing happens with those as well.

Checklist

[] I have added unit tests to the code I am submitting [] My unit tests cover both failure and success scenarios [] If applicable, I have discussed my architecture

ozgrakkurt commented 6 months ago

Also is it at least possible to make pub fn max_sectors_size() -> usize on both DmaFile and ImmutableFile ?

glommer commented 6 months ago

There's something going on here. I don't think anything is supposed to hang when I/O is too large, and this smells a bit like a bug (where, is the question)

glommer commented 6 months ago

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

ozgrakkurt commented 6 months ago

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?

ozgrakkurt commented 6 months ago

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?

actually nevermind. It hangs in D state if compiled with --release flag but can be exited with ctrl-c if compiled in dev mode not sure, it is annoying

ozgrakkurt commented 6 months ago

It does work on kernel 6.6.10-zabbly+ installed on Debian so maybe can close this

ozgrakkurt commented 5 months ago

closing this as it seems to be a kernel bug and the hanging doesn't happen on newer kernel (6.6) for me