NativeInstruments / ni-media

NI Media is a C++ library for reading and writing audio streams.
MIT License
235 stars 34 forks source link

Getting the exact number of frames from a stream #62

Open mre-ableton opened 9 months ago

mre-ableton commented 9 months ago

Hi There !

We're investigating how to handle loading wav files whose size - as declared in the RIFF header - is incorrect and a lot larger than the data actually contained in the file.

In such cases, we would like to crop the sound to the actual data and ideally, be able to determine the exact size without having to read the whole file upfront.

As expected, ifstream::num_frames will return the value read in the header.

However, the documentation states:

num_frames may differ from the actual number of frames in the stream
as this information relies on the codec. The only way to obtain the exact
number of frames is by seeking to the end of stream and retrieving the frame position.

So, we're trying to use ifstream::frame_tellg but it seems to return the same value as num_frame.

Here's an example code that evaluates the number of frame using num_frames, frame_seekg and by reading the data,.

  // Build a ifstream
  const auto filePath = makeTestFilePath("BB3_100_drum_break_paprika.wav");
  auto stream = audio::ifstream{filePath.string()};

  // Get the number of frames reported
  const auto reportedFrameNum = stream.info().num_frames();

  // Position the stream at the end
  stream.frame_seekg(0, std::ios_base::end);
  const auto seekedNumFrame = size_t(stream.frame_tellg());

  // Read the data until exhaustion to get the actual data contained in the file
  const auto dataFrameNum = [&]() {

    stream.frame_seekg(0, std::ios_base::beg);
    size_t frameCount{0};

    constexpr std::size_t kNiMediaReadChunkSize = 4096 / sizeof(float);
    const auto samplesPerChunk =
      std::min<std::size_t>(kNiMediaReadChunkSize, reportedFrameNum);
    std::vector<float> data(samplesPerChunk * stream.info().num_channels(), 0.0f);

    while (stream.read((char*)data.data(), std::streamsize(samplesPerChunk)))
    {
      frameCount += stream.frame_gcount();
    }

    return frameCount;
  }();

  std::cout << "Number of frames returning by num_frames " << reportedFrameNum
            << std::endl;
  std::cout << "Number of frames in the file " << dataFrameNum << std::endl;
  std::cout << "Number of frames from seeking " << seekedNumFrame << std::endl;

  CHECK(seekedNumFrame == dataFrameNum);

The output will be

Number of frames returning by num_frames 423360
Number of frames in the file 98090
Number of frames from seeking 423360

our expectation would be that seeking would also return 98090.

Is this the api we're supposed to use ? is there another way ?

Cheers.

PS: Here's the file this test was ran with BB3_100_drum_break_paprika.zip

wro-ableton commented 9 months ago

I looked into this. It seems that indeed using seekg and tellg doesn't provide different information that num_frames.

The reason is that the subview_device used in wav_source and other sources to create a stream view on the data portion of the file trusts that the provided view end position is accurate. It doesn't check if it is located past the end of the file. To my understanding, at least on Windows, the internal call to SetFilePointer will also allow moving the pointer past the end of file and the consecutive tellg will simply return the pointer offset.

I think it's interesting that ni-media allows loading wav files which proclaim they are longer than they actually are. So I would expect some file size check to ensure the RIFF and data chunk lengths are valid and an exception if not.