buzz / mediainfo.js

Extract media file metadata in the browser using WebAssembly.
https://mediainfo.js.org
BSD 2-Clause "Simplified" License
718 stars 108 forks source link

Speed difference MediaInfo and mediainfo.js #75

Closed kernchen closed 2 years ago

kernchen commented 3 years ago

Hi,

I've been doing some testing with regards to analysing media files, specifically larger ones and did compare times in the MacOS CLI version and using mediainfo.js in the browser on the same machine (Safari on MacOS 10.15.7, but a colleague also tested on Chrome).

It seems to me that execution times for mediainfo.js are significantly longer, i.e., a ~763MB Prores file takes ~3.5s to process in mediainfo.js vs. CLI version which takes 0.036s.

Maybe I am not understanding something, but what would be the reason of this and is there a way to speed this up in mediainfo.js.

We are basically trying to build a webtool to read meta-data for multiple files in a browser and our use case might not be viable if we not manage to improve the speed it takes to analyse files for meta-data.

Thank you for any thoughts or help in advance.

buzz commented 3 years ago

I never did extensive performance measurements on mediainfo.js. I suspect it to run slower than the C program though. When you took the measurements, what exactly did you measure? The pure processing time or did you include the "warm-up", namely loading and instantiating the WASM file in the browser? I suspect the warm-up phase would take quite long.

kernchen commented 3 years ago

Thank you for the prompt reply. I haven't checked a loading time for the WASM file. The test was probably not very scientific, just measuring from the point the file is opened till when the mediainfo was returned.

Happy to do some more measurements. Would you have a suggestion on how to just measure the "non warm-up" phase? We are not yet very familiar with the library, so apologies if I am not making fully sense yet.

BTW, we've compared times on the mediainfo.js page, with our Angular implementation. In the worst case we might be loading the WASM file every single time when analysing files sequentially?

buzz commented 3 years ago

The MediaInfo object gives you an instance of mediainfo which is ready to be used. It asynchronously does the WASM loading in the background.

MediaInfo({ format: 'text' }, (mediainfo) => {
    // Here the mediainfo WASM file is loaded and ready to process data. 
})

BTW, we've compared times on the mediainfo.js page, with our Angular implementation. In the worst case we might be loading the WASM file every single time when analysing files sequentially?

Yes, you should probably only instantiate the WASM once and then re-use it from that point on.

kernchen commented 3 years ago

Thanks @buzz. We'll double check this and try to make sure that we load the WASM file only once and report on any results we get after that. That might take a day or two.

JeromeMartinez commented 3 years ago

I (the main developer of MediaInfo library, used by mediainfo.js) tried with my own JS version and JS based analysis of a MOV/ProRes file is visually instantaneous.

Generally speaking, we expect now with up to date compilers and with modern browsers a x2-x4 impact when in JS vs native, not x100 :-p, which seems to us not normal.

kernchen commented 3 years ago

Thanks @JeromeMartinez and @buzz. We have done some more testing now. To that extent we used the existing 'browser-multiple' example from the repo and added some timing and log outputs. We used about 24GB of different size media files from:

https://arriwebgate.com/en/directlink/e631dcb5b6ac8eb5

The updated code using momentjs for duration calculation is as follows:

const fileinput = document.getElementById('fileinput')
const output = document.getElementById('output')

// getting reference start time
let lastDate = new moment()

function get_file_info(mediainfo, file) {

  let getSize = () => file.size
  let readChunk = (chunkSize, offset) =>
    new Promise((resolve, reject) => {
      let reader = new FileReader()
      reader.onload = (event) => {
        if (event.target.error) {
          reject(event.target.error)
        }
        resolve(new Uint8Array(event.target.result))
      }
      reader.readAsArrayBuffer(file.slice(offset, offset + chunkSize))
    })

  // reset reference time
  lastDate = new moment()

  return mediainfo
    .analyzeData(getSize, readChunk)
    .then((result) => {

      // calculate duration to analyse file and print out
      let duration = moment.duration((new moment()).diff(lastDate)).asSeconds()
      console.log(duration + " for fileSize: " + JSON.parse(result).media.track[0].FileSize)

      //Display outcome in HTML
      output.value = `${output.value}${result}`
    })
    .catch((error) => {
      output.value = `${output.value}\n\nAn error occured:\n${error.stack}`
    })
}

async function onChangeFile(mediainfo) {

  // reset reference time and print out timestamp
  lastDate = new moment()
  console.log(lastDate.format())

  output.value = null
  if (fileinput.files.length >= 2) {
    for (let i = 0; i < fileinput.files.length; i++) {
      file = fileinput.files[i]
      if (file) {
        await get_file_info(mediainfo, file)
        if (i + 1 == fileinput.files.length) {
          return
        }
      }
    }
  } else {
    file = fileinput.files[0]
    if (file) {
      await get_file_info(mediainfo, file)
    }
  }
}

MediaInfo({ format: 'JSON' }, (mediainfo) => {

  // log time it takes for MediaInfo to initialise
  console.log(moment.duration((new moment()).diff(lastDate)).asSeconds())

  fileinput.addEventListener('change', () => onChangeFile(mediainfo))
})

The log output with timings for different file sizes in Chrome is as follows:

example.js:71 0.205 <--- MediaInfo loading duration, from Script execution
example.js:47 2021-06-10T10:08:10+01:00 <--- Test reference start time
example.js:33 3.221 for fileSize: 736916512
example.js:33 3.476 for fileSize: 800478024
example.js:33 11.794 for fileSize: 2693398212
example.js:33 12.384 for fileSize: 2985889981
example.js:33 12.861 for fileSize: 3186999999
example.js:33 12.835 for fileSize: 3044725381
example.js:33 13.318 for fileSize: 2891182248
example.js:33 22.501 for fileSize: 5113988900
example.js:33 13.408 for fileSize: 3124161135

We noticed that the speed depends a bit on how much is running on the CPU in the background. But this was a fairly clean run and looks somewhat consistent with several timed runs. This was run on a 2019 MacBook Pro.

@JeromeMartinez, we did also use MediaInfo Website for testing the 5GB file and we had about the same results in terms of times.

@buzz, the WASM loading time seems negligible, staying consistently under 0.5s

Would you have any more thoughts on this? Must have something to do with the reading of the file, as seems linearly getting longer the larger the file size?

JeromeMartinez commented 3 years ago

I think I spot the issue, looks like that a seek request (for going to the end of the file and reading the "header" of the file) isn't handled (it is with the desktop version) so the whole file is read (and data discarded) before reaching the expected file position. I can reproduce the issue, so the issue is in MediaInfo library, not the mediainfo.js binding. Will try to fix that, but no ETA (I am late with tons of other issues).

kernchen commented 3 years ago

Thanks for your prompt reply @JeromeMartinez, once again. That is very helpful.

Just out of interest, would that be more of a general issue or just in the latest version? I.e., would a different version help temporarily resolve the issue?

Certainly interested to hear once you had time to look into a fix.

JeromeMartinez commented 3 years ago

would that be more of a general issue or just in the latest version? I.e., would a different version help temporarily resolve the issue?

I can not be sure but I doubt, not something I recently modified.

kernchen commented 3 years ago

Thanks once again @JeromeMartinez for now.

kernchen commented 3 years ago

Hi @JeromeMartinez. I just wanted to check-in to see whether there would be any progress on this? Certainly understand your previous statement that you have all hands busy.

For planning purposes (a bit stuck at the moment with this one), is there any indication we could get, when this could be looked at? Are there any alternatives that could help to expedite this issue. Happy to discuss offline, if there is anything we could do / help with. Please just bear in mind we are not familiar with the inner works of the library at this point.

Any return appreciated.

JeromeMartinez commented 3 years ago

Happy to discuss offline

Please contact us at info@mediaarea.net.

JeromeMartinez commented 3 years ago

Fixed. Latest snapshots have the fix.

kernchen commented 3 years ago

Hi @JeromeMartinez. I can now report that we built mediainfo.js from the latest snapshot and tested the fix and scanning is now near instant in a quick test compared to previous version.

We'll do some further performance testing as part of our project.

Guess next step for the mediainfo.js library will be to integrate with latest mediainfo version release that will include this fix.

Thanks a lot for your uncomplicated help on this one!

behind2 commented 3 years ago

Could I ask which tag can be downloaded to the repaired version?

behind2 commented 3 years ago

I downloaded the latest version(v0.1.6) Doesn't seem to have changed much

buzz commented 3 years ago

I downloaded the latest version(v0.1.6) Doesn't seem to have changed much

This is expected.

mediainfo.js uses the latest release version of libmediainfo (Version 21.03, March 26) but the fix is from Jul 14. The fix is not yet included in any release version.

JeromeMartinez commented 3 years ago

The fix is not yet included in any release version.

Expected in few days :-p.

behind2 commented 3 years ago

Very much looking forward to it, thanks

jimmymic commented 3 years ago

mediainfo.js Version 21.09, 2021-09-17 looks to be released. @buzz can this be incorporated?

buzz commented 3 years ago

mediainfo.js Version 21.09, 2021-09-17 looks to be released. @buzz can this be incorporated?

mediainfo v0.1.7 released.