beacon-biosignals / EDF.jl

Read and write EDF files in Julia
MIT License
18 stars 5 forks source link

Try to determine the file size up front in more cases #62

Closed ararslan closed 1 year ago

ararslan commented 1 year ago

An internal function called _size is used to determine the number of bytes in the EDF file, which is in turn used to determine whether there are any truncated data records. Currently it has specific methods for IOStream and IOBuffer as well as a general fallback that returns a sentinel value. When we get a sentinel value, we skip the check for truncated records. This can lead to EOFErrors for truncated data read from some other kind of stream type.

We can improve the fallback, albeit not particularly efficiently, by checking for the applicability of seeking-related functions: position, seek, and seekend. The file size is then determined by the position after seeking to the end. Note that an equivalent alternative would be to do something like

try
    here = position(io)
    seekend(io)
    nbytes = position(io)
    seek(io, here)
catch ex
    ex isa MethodError || rethrow()
    return -1
end

I didn't do any performance comparison against that but it seems generally less safe; you might find out that you can't seek back to where you started only after getting to the end, which means you've lost any ability to read signal data without closing and reopening the stream (assuming that's even possible).

Another way to do this would be to have EDF.read call stat if its input is a file and keep track of the file size that way. That would be a somewhat more annoying refactor though.