Closed ozgrakkurt closed 5 months ago
Also is it at least possible to make pub fn max_sectors_size() -> usize
on both DmaFile
and ImmutableFile
?
There's something going on here. I don't think anything is supposed to hang when I/O is too large, and this smells a bit like a bug (where, is the question)
FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?
FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?
I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?
FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?
I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?
actually nevermind. It hangs in D state if compiled with not sure, it is annoying --release
flag but can be exited with ctrl-c if compiled in dev mode
It does work on kernel 6.6.10-zabbly+
installed on Debian so maybe can close this
closing this as it seems to be a kernel bug and the hanging doesn't happen on newer kernel (6.6) for me
What does this PR do?
This PR tries to prevent the case of executing
DmaFile::read_at
with a larger size thanDmaFile::max_sectors_size
. Because this causes the process to hang forever in D state in some kernels (e.g. 5.15)and just hang forever in interruptible state in other kernels. (e.g. 6.1)Motivation
What inspired you to submit this pull request?
I have this case where I read pages of parquet files from disk in parallel and this problem causes my program to hang. I am able to mitigate this issue by just splitting my reads before calling read_at/read_many but this is still a problem since it took a lot of time to debug and it causes entire process to hang instead of just returning an error/splitting the reads internally. Imo there should be an error if read_at is called like this and read_many should split the big reads internally.
Related issues
A list of issues either fixed, containing architectural discussions, otherwise relevant for this Pull Request.
633
Additional Notes
Anything else we should know when reviewing?
I want to also implement the internal splitting inside
read_many
if it makes sense. Also can implement internal splitting for read_at as well if error doesn't make sense. Also want to do the same stuff for write functions, I am guessing the same thing happens with those as well.Checklist
[] I have added unit tests to the code I am submitting [] My unit tests cover both failure and success scenarios [] If applicable, I have discussed my architecture