MediaArea / MediaInfoLib

Convenient unified display of the most relevant technical and tag data for video and audio files.
https://mediaarea.net/MediaInfo
BSD 2-Clause "Simplified" License
628 stars 170 forks source link

Publish Javascript Module to NPM registry #769

Open mooyoul opened 6 years ago

mooyoul commented 6 years ago

The "Javascript module" version of Mediainfo seems superior. It works perfectly on both Client-side (Browser-side) and Server-side (e.g. Node.js) environment. (it's can be called isomorphic)

i thought it would be nice if we publish Javascript module to NPM, which is most popular javascript package repository.

but there are some problem. Current exposed API is quiet complex, not javascript friendly. In Node.js environment, we can provide powerful Stream interface (maybe?)

so I'm interested in about making new Javascript package, which provides more javascript-friendly interface.

anyway, what do you think about publishing javascript module to public package repository (e.g. NPM)?

JeromeMartinez commented 6 years ago

The "Javascript module" version of Mediainfo seems superior.

Hum... Compared to?

It works perfectly on both Client-side (Browser-side) and Server-side (e.g. Node.js) environment. (it's can be called isomorphic)

:). We may need to add a node.js example in our examples directory for better showing possibilities out of the browser.

i thought it would be nice if we publish Javascript module to NPM, which is most popular javascript package repository.

I agree! Issue on our side is that we are discovering this part; some help would be definitely appreciated!

but there are some problem. Current exposed API is quiet complex, not javascript friendly. In Node.js environment, we can provide powerful Stream interface (maybe?)

hum... What do you mean? we just adapted the classic C API of MediaInfo to JavaScript, using the "by buffer" interface as we can not directly access to the file API (so opening referenced files is deactivated), and there is only an init, buffer feeding, check seek request and finalize command. How could it be more simple? "Stream interface" I don't see how it would be usable, curious of a proof of concept.

so I'm interested in about making new Javascript package, which provides more javascript-friendly interface.

Please do, it would be great. But some constraints for being in the our (upstream) repo: we try to have a similar API on all bindings for easier maintenance, so we don't expect to accept big changes in the API only for a specific binding without big arguments about the need to do so. Not sure about what you mean about "more Javascript-friendly", if it is a lot of work it may be better to explain the proposed update before implementing in order not to be frustrated bout us .

anyway, what do you think about publishing javascript module to public package repository (e.g. NPM)?

It is on our todo-list, especially in order to be able to control the releases with a MediaArea acocunt, as we got some complains about some unofficial bindings published a bit everywhere but abandoned fe years later (and we can't access the account as it is not ours) or a broken update due to some incompatibilities (as we don't control such bindings, people have sometimes used something wrongly in their bindings then users complain to us that we break something we did not developed :( ) So having an eternal point of view from something knowing NPM and "more Javascript-friendly" would be a great help before we publish something on NPM.

Please provide more details/code about the updates you have in mind.

mooyoul commented 6 years ago

Jerome, Thanks for your kind reply :)

Here is my example snippet for Node.js: asciinema demo

const BbPromise = require('bluebird');
const debug = require('debug');
const fs = require('fs');
const path = require('path');
const MediaInfoModule = require('./MediaInfo');

const CHUNK_SIZE = 1024 * 1024;

const LOG_TAG = 'mediainfo-node-example';
debug.enable(LOG_TAG);
const log = debug(LOG_TAG);

const { MediaInfo } = MediaInfoModule({
  async postRun() {
    log('ready');
    console.log(await parseFile('/YOUR_FILE_PATH'));
  }
});

async function parseFile(filePath, callback) {
  // Get file stat
  const stat = await BbPromise.fromCallback((cb) => fs.stat(filePath, cb));
  log('stat: ', stat);

  // Open up file
  const fd = await BbPromise.fromCallback((cb) => fs.open(filePath, 'r', cb));
  const buf = Buffer.alloc(CHUNK_SIZE);

  // Initialise MediaInfo
  const MI = new MediaInfo();

  MI.Option('File_FileName', path.basename(filePath));
  MI.Open_Buffer_Init(stat.size, 0);

  let offset = 0;
  let shouldStop = false;
  let report = "";

  try {
    do {
      log('reading %d bytes from %s byte offset', CHUNK_SIZE, offset);

      const [ bytesRead ] = await BbPromise.fromCallback(
        (cb) => fs.read(fd, buf, 0, CHUNK_SIZE, offset, cb),
        { multiArgs: true },
      );

      log('read %d bytes (allocated: %d bytes)', bytesRead, buf.length);

      // Send the buffer to MediaInfo
      const state = MI.Open_Buffer_Continue(buf.slice(0, bytesRead));
      log('state: %d (binary: %s)', state, `00000000${state.toString(2)}`.slice(-8));

      // Test if there is a MediaInfo request to go elsewhere
      const seekTo = MI.Open_Buffer_Continue_Goto_Get();
      log('seekTo: ', seekTo);
      if (seekTo === -1) {
        offset += bytesRead;
      } else {
        offset = seekTo;
        MI.Open_Buffer_Init(stat.size, seekTo); // Inform MediaInfo we have seek
      }

      // Bit 3 set means finalized
      shouldStop = state & 0x08 || bytesRead < 1;
    } while (!shouldStop);

    MI.Open_Buffer_Finalize();
    MI.Option('Complete');
    report = MI.Inform();
  } catch (e) {
    throw e;
  } finally {
    log('cleaning up');
    MI.Close();
    MI.delete();
  }

  await BbPromise.fromCallback((cb) => fs.close(fd, cb));
  return report;
}

and so sorry about confusing "javascript-friendly" description.

I mean, It would be nice if we provide some wrapped function which handles fs stuffs for users.

If i want to get mediainfo results of file foo/bar.baz, just require mediainfo module, and call read method with filepath without handling fs stuffs. like this:

const mediainfo = require('mediainfo');
const report = mediainfo.read('foo/bar.baz'); // yay!

const xml = mediainfo.read('foo/bar.baz', { format: 'xml' }); 

it's more simpler, isn't it?

JeromeMartinez commented 6 years ago

Here is my example snippet for Node.js

Thanks. Some comments:

@g-maxime, please add a node.js example base on the code from @mooyoul + my comments.

I mean, It would be nice if we provide some wrapped function which handles fs stuffs for users.

As I understand, it is more or less some wrapping around the current "raw" (directly on the C++ methods) interface, similar to the "File" interface we have with other bindings. So moving the "parseFile" example to a dedicated member of the MediaInfo class already created (by emscripten, so not sure it is possible)

@g-maxime, please check if we can add an "Open(FileName)" member to the MediaInfo class which would be a wrapper for the the other Open_Buffer_xxx calls.

for the "{ format: 'xml' }" part of the example, I get your point about a fancier interface but I prefer to keep the API similar to the other bindings, so the "read()" ( "Open()" in our C++ API) result in your example should be info about if the file was found or not, and the format is set by an option. changes in the API could happen when we decide to break the compatibility for all bindings, in the meanwhile I prefer to keep a similar API also for new bindings in order to have less problems when we do the support (e.g. people complaining that what they saw on the Internet is not possible with the binding they use, lot of people do't really car that the binding language is not the same when they look for a solution).

dmooney65 commented 6 years ago

Hi, I was looking for a few code examples and came accross this issue. I've written a small N-API/c++ wrapper for MediaInfoLib and would appreciate some help/advice.

The idea is to create something flexible, so the input is a filename (local only) and a JavaScript Object of the form: { General: [ 'Parameter',...], Audio: [ 'Parameter'...], Video: [ 'Parameter',...],...}

The output is of the form: { General: [ Parameter: 'value', ...], Audio: [ Parameter: 'value'], ...} I've only implemented General, Audio and Video, but adding others won't be difficult (although I haven't looked at "Menu" yet). All input parameters have special characters stripped for output. Not ideal I suppose but it will make the output easier to deal with on the JS side.

Since input/output are both just plain Objects I'm hoping it shouldn't be too sensitive to changes to MediaInfoLib - is this naive?

Can there be more than one "General" stream?

Is there any way to know whether a parameter is deprecated in code?

Is there any way to know if a parameter is invalid rather than just not populated?

The main purpose initially was to get around the ~ 3Gb file size limitation on 32bit systems (especially Raspberry Pi) and it works perfectly for this. It will read a 4.2Gb flac file in ~0.2 of a second (c++ exec time) on a Pi zero. It uses the "By buffer example" from "HowToUse_Dll.cpp" with "ftell", "fseek" etc. replaced by their offset counterparts ("ftello", "fseeko" etc.).

The repo is https://github.com/dmooney65/node-mediainfolib Contributions welcome ;)

JeromeMartinez commented 6 years ago

I've written a small N-API/c++ wrapper

Great!

The idea is to create something flexible, so the input is a filename (local only) and a JavaScript Object

On our side, we have our own JSON output with MediaInfo::Option("Output", "JSON"), this is what we advise now except that the output may change in the future, there is a discussion about JSON output changes.

Since input/output are both just plain Objects I'm hoping it shouldn't be too sensitive to changes to MediaInfoLib - is this naive?

I don't know for your design, but on our side we try to have a versatile output, this is the reason we e.g. don't say that General section will always be once: all sections are handled the same way, so with n possible instances.

Is there any way to know whether a parameter is deprecated in code?

At compilation? corresponding line in files at https://github.com/MediaArea/MediaInfoLib/tree/master/Source/Resource/Text/Stream contain "Deprecated". Dynamic? MediaInfo::Get(x, x, x, Info_Info) contains "Deprecated" if deprecated

Is there any way to know if a parameter is invalid rather than just not populated?

What is "invalid" from your point of view?

The main purpose initially was to get around the ~ 3Gb file size limitation on 32bit systems (especially Raspberry Pi) and it works perfectly for this.

Which limitation? MediaInfo runs well with 4+ GB files on Raspberry Pi. There was a regression in few versions preventing the 4+ GB file support, but I think it is fixed now.

The repo is https://github.com/dmooney65/node-mediainfolib Contributions welcome ;)

Actually it would be good to have the opposite, N-API support upstream with a patch ;-). We already have a lot of binding, another one would be good, but we try to have a consistent API between bindings, so the one upstream may be a bit different.

Noob question: is node-mediainfolib.cpp something we could have in our binaries? e.g. like JNI we add an access point for JNI and Java needs nothing else on the C++ part.

dmooney65 commented 6 years ago

Regarding the large file limitation - it is still there on the latest Raspbian and also I think on Debian in general as the same bug is evident on Linaro Stretch for the Asus Tinkerboard. Presumably the problem is with libzen0v5 0.4.34-1 as that is the version installed on both distros.

For the output, I read through the JSON output discussion - not a problem to implement but is this finalised? It wouldn't make sense to attempt to directly use the JSON output logic, but it would be simple to mirror the structure.

Noob question: is node-mediainfolib.cpp something we could have in our binaries? e.g. like JNI we add an access point for JNI and Java needs nothing else on the C++ part.

I doubt it - publishing to npm needs a git repo with a valid structure. The module could potentially exist as a separate new repo under MediaArea though.

JeromeMartinez commented 6 years ago

Regarding the large file limitation - it is still there on the latest Raspbian and also I think on Debian in general

Sad that this as not backported. Please report to your distro maintainers the issue (the fix is easy, and questions were already answered on Ubuntu ticket).

I read through the JSON output discussion - not a problem to implement but is this finalised?

Not yet, so currently there is only the "old" JSON. Just that I want to debate on JSON output only in this discussion, as we'll focus on this version. No ETA, this is currently not our priority, I just hope I can work on it next month.

publishing to npm needs a git repo with a valid structure.

My question is more about the binary part: do you have a binary in addition to libmediainfo when the npm package is installed? I am wondering if the extra binary could be "merged" with libmediainfo, if such binary exists. But maybe not the way it uses to work with npm.

dmooney65 commented 6 years ago

Please report to your distro maintainers the issue

A new one for me but I'll give it a try.

But maybe not the way it uses to work with npm.

Correct - but prebuilt binaries can be supplied via npm. Users would then only need access to the lib(s) and not the header files.