More useful to return unknown data when parsing frames, rather than "undefined".

aadsm / JavaScript-ID3-Reader

ID3 tags reader in JavaScript (ID3v1, ID3v2 and AAC)

http://www.aadsm.net/libraries/id3/

Other

555 stars 145 forks source link

More useful to return unknown data when parsing frames, rather than "undefined". #52

Open gyachuk opened 9 years ago

gyachuk commented 9 years ago

More useful to return unknown data when parsing frames, rather than "undefined".

gyachuk commented 9 years ago

Also, have you considered changing this over to using promises, rather than strung-together callbacks?

aadsm commented 9 years ago

Not really sure about this, why is it more useful to try to read the frame data as a string? Seems kind of a stretch, can you talk about your use case?

Regarding promises, I much prefer them to be honest, I've used Q quite extensively in the past years. I didn't use it because it would add an extra weight to the library that I'm not convinced most people would benefit from. Also, it's pretty easy to promisify the two functions with a thin wrapper. Q even provides that functionality with Q.nfcall: http://documentup.com/kriskowal/q/

gyachuk commented 9 years ago

I went back and forth between getStringAt() and getBytesAt(). It finally came down to being able to read the first few bytes of returned data in the debugger, for my use case. I'm parsing a "PRIV" tag. I'd be happy with getBytesAt(), too. Either one is better than not returning the data at all.

gyachuk commented 9 years ago

I've not used Q, but have been using promises in jQuery for quite a while. They clean up a lot of callback nesting. I hear what you're saying about extra weight.

One question I have for you is about ID3._files. It seems that ID3.loadTags() will always overwrite the tags for some URL, so it isn't for caching. I'm guessing it's so you can access the tags by URL from within the callback. Any reason for not just passing the tags into the callback?

aadsm commented 9 years ago

Instead of doing that how about passing an optional function to loadTags that is called whenever the parser doesn't know how to read a specific frame?

It is for caching, you only need to call loadTags once and then getAllTags at your heart's desire. You should only call load again if for some reason think the server file changed.

Load/Get is a common pattern when dealing with async operations. In theory you could use only a Load function that uses the cached value or goes to the server but then you'd have two problems:

You'd need an extra flag or something to force it to load again in case you need it (clearing the cache basically).
You're bottlenecked by the async nature of the Load function, even when the value is cached.

aadsm commented 9 years ago

Btw, here's how you could promisify this with jQuery:

function getTags(filename, options) {
  var deferred = $.Deferred();
  var tags = ID3.getAllTags(filename);

  if (tags) {
    deferred.resolve(tags);
  } else {
    var newOptions = Object.create(options);
    newOptions.onError = function(reason) {
      deferred.reject(reason);
    };

    ID3.loadTags(
      filename,
      function onTagsLoad() {
        deferred.resolve(ID3.getAllTags(filename));
      }
      newOptions
    );
  }

  return deferred.promise();
}

This is based on: http://stackoverflow.com/questions/22519784/how-do-i-convert-an-existing-callback-api-to-promises

Disclaimer: I didn't run this code so how knows if it works :)

gyachuk commented 9 years ago

The reason I'm asking about _files is that I've got a stream of video data coming in, with ID3 data every second or so. I've wired up a dataReader for a buffer as follows, with the call to ID3.clearTags so that we don't keep these ID3 objects hanging around. Without that, it could easily become quite a memory hog. This may not be a very common usage.

  id3Updated: function(ID3Data) {
      var data = atob(ID3Data);
      var id3Tags;
      ID3.loadTags(data, function() {
          id3Tags = ID3.getAllTags(data);
          ID3.clearTags(data);
      }, {
              dataReader: function(str, fncCallback, fncError) {
                  fncCallback(new BinaryFile(str));
              },
              // I'm using this to make sure all possible frames are returned.
              // I know this isn't very efficient. It only needs to be calculated once.
              tags: Object.keys(ID3v2.frames).map(function(key) { return key; })
          }
      );

      return id3Tags;
  }

And the reason I was asking about promises was that it seems this would be natural use case, where you wouldn't need to keep anything in backing storage.

  id3Updated: function(ID3Data) {
      var data = atob(ID3Data);
      var id3promise = ID3.loadTags(data, {
              dataReader: function(str, fncCallback, fncError) {
                  // Might be clearer here to just return new BinaryFile(str),
                  // but I'm keeping with the existing idiom.
                  fncCallback(new BinaryFile(str));
              }
      });
      id3promise.then(function(id3Obj) {
          var id3Tags = id3Obj.getAllTags();
          return id3Tags;
      });
      id3promise.error(function() {
          return {};
      });
      return id3promise;
  }

In other words, the promise is returned from the loadTags() call, not from the getAllTags() call. With your code above, you still have to provide a callback to loadTags that calls getAllTags. By that time, the data has already arrived, so I'm not sure that a promise provides any real improvement.

I hope you're not taking this as any kind of criticism. I think this library is great. I'm just asking about possible paths for future development.

aadsm commented 9 years ago

Oh, I see your use case yeah. However, if the loadTags returned a promise I would still maintain a cache system because I would want to have a way to get the tags of a specific file without having to "force" the consumer to store the promise they got from loadTags.

On the jQuery example, it's true that I still need to provide a callback, but that's the way to promisify a function. But from the consumer point of view it's irrelevant, this is just an abstraction. You don't ever need to deal with how it's implemented. You just need to add a jquery-id3.js file to your project with:

ID3.jLoadTags = function(filename, options) {
  var deferred = $.Deferred();
  var tags = ID3.getAllTags(filename);

  if (tags) {
    deferred.resolve(tags);
  } else {
    var newOptions = Object.create(options);
    newOptions.onError = function(reason) {
      deferred.reject(reason);
    };

    ID3.loadTags(
      filename,
      function onTagsLoad() {
        var tags = ID3.getAllTags(filename);
        ID3.clearTags(data);
        deferred.resolve(tags);
      }
      newOptions
    );
  }

  return deferred.promise();
}

and then use it like this in your logic:

id3Updated: function(ID3Data) {
  var deferred = $.Deferred();
  var data = atob(ID3Data);

  ID3.jLoadTags(data, {
    dataReader: function(str, fncCallback, fncError) {
      // Might be clearer here to just return new BinaryFile(str),
      // but I'm keeping with the existing idiom.
      fncCallback(new BinaryFile(str));
    }
  }).then(deferred.resolve, function() { deferred.resolve({}) });

  return deferred.promise();
}

I'm not really sure what you mean with "By that time, the data has already arrived, so I'm not sure that a promise provides any real improvement." though.

Oh no, I don't take it as criticism, I take it as valuable feedback :) I don't get feedback on the library as often as I would like so this is really helpful! There's a bunch of things I don't like about the current API (and internal architecture to be honest), this was based on a demo to show off loading raw bytes from an XHR, I should have thought about it better at the time but it is what it is now. I've been working on a new version on my "free" time, with unit tests and stuff. I'm glad it's helpful to you!

How about my suggestion to add an optional function to read unknown frames?

Btw, on your example, the tags can be just Object.keys(ID3v2.frames) which already returns an array with all keys.

gyachuk commented 9 years ago

Thanks for the note about Object.keys(). That's what I get for taking code from stackoverflow and blindly adapting it. :)

As for a function to read unknown frames, it might not be a bad idea to be able to override it. I'm not sure why you wouldn't want to make that the default, though. It might be a philosophical difference. I think I'd go for returning as much as possible, and returning less when explicitly asked.

In other words, I'd set it up so that by default, you'd return all tags, not just the ones from the default shortcuts (I'm talking about default behavior here. I know this can be overridden). I'd also return all available data for each tag, unless explicitly requested otherwise. That could be the inverse function of what I think you're talking about, which is one that skips over the data and returns "undefined", like your current implementation does.

And yes, your jLoadTags function would do exactly what I'd like. I guess it's another philosophical difference, but I still don't care much for keeping user data around in the ID3._files object. To me, it is pretty much keeping global data hidden behind the caller's back. BTW, the caller wouldn't be keeping the promise around, but instead the tags themselves. I guess I'd set it up so that the caller gets the tags and if they want to keep them, they put them somewhere. If they don't want them kept, they do nothing. The way the code is currently set up, it is the opposite. If the caller wants to keep the data around, they do nothing. If they just want to process it once and forget it, they have to explicitly clear the tags.

Finally, sorry for the confusion about my "By that time, the data has already arrived..." comment. It is directed to the situation where the getAllTags was returning a promise, rather than consuming one. It doesn't apply when the promise is being returned by loadData().

And thanks again for the great code.

aadsm commented 9 years ago

As for a function to read unknown frames, it might not be a bad idea to be able to override it. I'm not sure why you wouldn't want to make that the default, though. It might be a philosophical difference. I think I'd go for returning as much as possible, and returning less when explicitly asked.

I understand the idea but in this case you're proposing to parse the data as a string even when it's not a string. This is the bit I don't agree with. I wouldn't mind passing the raw data though.

In other words, I'd set it up so that by default, you'd return all tags, not just the ones from the default shortcuts (I'm talking about default behavior here. I know this can be overridden). I'd also return all available data for each tag, unless explicitly requested otherwise. That could be the inverse function of what I think you're talking about, which is one that skips over the data and returns "undefined", like your current implementation does.

Yeah, I agree with this, this should have been the default behaviour from the beginning.

And yes, your jLoadTags function would do exactly what I'd like. I guess it's another philosophical difference, but I still don't care much for keeping user data around in the ID3._files object. To me, it is pretty much keeping global data hidden behind the caller's back. BTW, the caller wouldn't be keeping the promise around, but instead the tags themselves. I guess I'd set it up so that the caller gets the tags and if they want to keep them, they put them somewhere. If they don't want them kept, they do nothing. The way the code is currently set up, it is the opposite. If the caller wants to keep the data around, they do nothing. If they just want to process it once and forget it, they have to explicitly clear the tags.

As a consumer you shouldn't care how the library manages its own internal data. Implementing cache systems on I/O (or other type) bound functions is a really common pattern, there's even a design pattern addressing this use case, the Memoization.

I agree that the current implementation has a big issue, it doesn't have a limit on the cache size, or a way to configure this limit. If the library doesn't implement their own cache system then the consumer will have to. I prefer to have this kind of boilerplate code in the library instead of asking all consumers to implement the same thing over and over. This is the rationale behind this design decision.

I really appreciate your comments!

gyachuk commented 9 years ago

Glad you're receptive to the comments.

I'm happy as a client to call clearTags() as soon as I'm done with the tags.

Sounds like the simple change to my PR is to return frameData.getBytesAt() rather than frameData.getStringAt().

I'd be OK with that. I think I said that way up on one of the first comments. :)

Problem though with that is that I'd really like to use the BinaryReader to get at shorts, ints and longs. But that doesn't work completely well, since strData comes in with typeof(strData) === "object", at least on Firefox. So getByte() never gets defined. And it is pretty widely used by BinaryReader. Any thoughts on how to address that?

Also, was wondering why you chose the Opera license, instead of the MIT license? And did you know that the Opera links in your LICENSE.md get a 404 these days. I can't even find the terms of the Opera license.

aadsm commented 9 years ago

Cool, I missed that comment though! To solve the getBytesAt() problem, how about returning a new BinaryFile that only handles the data returned by getBytesAt()? Add a .getBinaryFileWithRange function and use it.

There is no Opera license, it's BSD. The copyright of a big bulk of the code is owned by Opera because I was working there at the time and they were using the BSD license. Yeah, they completely changed dev.opera.com, they've dumped it on github https://github.com/operasoftware/devopera, but you can check it using the wayback machine: https://web.archive.org/web/20100426105813/http://dev.opera.com/licenses/bsd/.