aramds / php-reader

code.google.com/p/php-reader
0 stars 0 forks source link

ID3v2 Improvements #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, 

I've needed to use PHP-reader for a live site and needed it to be reliable, so 
I sat down for a few 
weeks and fixed every bug in the ID3v2 code I encountered (there were a lot). 
This includes 
synch-safe frame detection to handle malformed ID3v2.4 frames (as written by 
iTunes). I've also 
added base MPEG reading/writing support as I needed to know the playtime of a 
MP3 file.

All changes should match your coding style/standard as closely as possible.

Thanks for a great base library, its much cleaner than getID3(), hence why I 
decided to improve 
your codebase and not theirs. :-)

Keep up the great work!

Ryan Butterfield
buttza@gmail.com

CHANGE LOG:

- NEW: Implemented synch safe frame detection to detect whether a frame is 
using a synch safe 
integer or not and parse accordingly. This allows parsing of ID3 frames that 
have been written as 
non-synch safe when they should be synch safe (eg. ID3v2.4). iTunes is 
notorious for doing this 
and its a known bug/limitation.
- CHANGE: Made AbstractLink and AbstractText non-abstract for unit testing 
purposes.
- FIX: Changed all preg_split's, which separated unicode strings using the 
double null 
terminators to use a new function ID3_Object::explodeString16. This new 
function fixes a bug 
(which shows itself more often on little endian machines) in which a double 
null terminator is 
found using a null byte from a previous character and the first null byte of 
the null terminating 
character. This causes the split to be 1 character out and on an odd boundary. 
This function fixes 
the problem by only allow 2-byte aligned splits.
- CHANGE: Changed all non-unicode to use ID3_Object::explodeString8 for 
consistency with 
above.
- CHANGE: Removed UTF16LE constant from ID3_Encoding because it doesn't make 
any sense if 
the machine running the PHP code isn't little endian.
- CHANGE: Changed many frame functions from adding another set of getData, 
setData, getSize, 
setSize functions which would override the base functions defined in ID3_Frame.
- FIX: Fixed bug in ID3_Frame_AENC::__construct which didn't convert 
_previewStart and 
_previewLength to integers.
- FIX: Fixed a few bugs in ID3_Frame_APIC::__construct where it parsed the 
frame data 
incorrectly.
- NEW: Added getImageSize and setImageSize to ID3_Frame_APIC and ID3_Frame_COMR 
to return 
the size of the image data.
- TIDY: Ensured ID3_Frame_COMM and other ID3_Language based frames only save 
language 
codes of length 3 bytes.
- FIX: Fixed bug in ID3_Frame_COMR::__construct where the image data would be 
the source of 
a split before it was assigned a value.
- FIX: Fixed bug in ID3_Frame_COMR::getDate which returned the price instead of 
the date.
- FIX: Fixed bug in ID3_Frame_COMR::__toString which left out the currency when 
constructing 
the data string.
- FIX: Fixed bug in ID3_Frame_COMR::__toString which saved the description 
during UTF16BE 
mode as machine endian UTF16 data.
- FIX: Fixed bug in ID3_Frame_EQU2::__construct which incremented by 8 instead 
of 4.
- FIX: Fixed bug in numerous frames that used sort instead of ksort. This would 
cause all key 
data to be lost.
- FIX: Made ID3_Frame_EQU2 storage of frequency and adjustment values 
consistent between all 
functions.
- TIDY: ID3_Frame_EQUA::__construct now throws an exception if attempting to 
parse data with 
an unsupported adjustment bit size.
- FIX: Fixed bug in ID3_Frame_EQUA::__construct and ID3_Frame_EQUA::__toString 
which was 
testing bit 13 (0x2000) instead of bit 15 (0x8000).
- FIX: Fixed bug in ID3_Frame_EQUA::__construct which used $j instead of $i.
- TIDY: Replaced format members in ID3_Timing children frames which were 
defined with 1 with 
the constant ID3_Timing::MPEG_FRAMES.
- FIX: Fixed typo in ID3_Frame_ETCO::setEvents.
- FIX: Fixed bug in ID3_Frame_GEOB::__construct which substr'd from a wrong 
offset.
- FIX: Fixed bug in ID3_Frame_GEOB::__construct which assigned a split result 
to _data instead of 
the intended _objectData.
- FIX: Fixed bug in ID3_Frame_IPLS::__construct which split the base data 
instead of the 
remaining data.
- CHANGE: ID3_Frame_IPLS::__construct now only iterates over pairs of split 
data, guarding 
against an odd number of splits.
- FIX: Ensured each individual string in ID3_Frame_IPLS is stored as a string 
with a BOM if UTF16. 
Also support frames that were written with the BOM defined only in the first 
string, applying the 
selected endian for all remaining strings.
- FIX: Fixed numerous from/to Int8/16/32's making them UInt as defined by the 
ID3v2 
specification.
- FIX: Fixed bug in ID3_Frame_POSS::__toString which saved the position as 
machine endian, not 
big endian as defined by the ID3v2 specification.
- CHANGE: Renamed _size and _flags in ID3_Frame_RBUF to avoid conflicts with 
the same 
member variables defined in ID3_Frame.
- FIX: ID3_Frame_RBUF::__construct now checks the data size to determine 
whether an offset is 
defined, as defined in the ID3v2 specification.
- TIDY: Made "channelType", "volumeAdjustment" and "peakVolume" constants in 
ID3_Frame_RVA2 to avoid typos and to facilitate outside use of these values. 
This makes typos 
run-time syntax errors.
- FIX: Fixed compile error in ID3_Frame_RVA2::__construct. Also fixed logic 
relating to this 
compile error.
- TIDY: Made numerous string keys constants in ID3_Frame_RVAD for the reasons 
described 
earlier.
- FIX: Fixed largely wrong parsing and writing of RVAD frame data in 
ID3_Frame_RVAD::__construct and ID3_Frame_RVAD::__toString.
- FIX: Fixed bug in ID3_Frame_SIGN::__construct that read the group as a string 
instead of an 
integer.
- FIX: Fixed offset bugs during reading in ID3_Frame_SYLT::__construct.
- NEW: Fully parse ID3_Frame_SYLT as described in the ID3v2.4 specification.
- NEW: Fully parse ID3_Frame_SYTC as described in the ID3v2.4 specification.
- FIX: Fixed __construct and __toString in ID3_Frame_TXXX and ID3_Frame_WXXX to 
skip their 
abstract parents and call through to the corresponding ID3_Frame functions, as 
they had to fully 
override their abstract parents.
- FIX: Fixed incomplete function arguments in ID3_Frame_USER::__construct and 
ID3_Frame_USLT::__construct.
- FIX: Fixed bug in ID3_Header and ID3_ExtendedHeader that didn't read/write 
the tag size as a 
synch-safe integer.
- CHANGE: Pulled non-data functionality of ID3_Frame out to ID3_FrameHeader to 
facilitate and 
speed up synch-safe frame detection. This allows creating a frame without 
reading all its data, 
only the header.
- CHANGE: Changed encodeSynchsafe32 and decodeSynchsafe32 to always perform 
what their 
name entails, not according to the ID3 version.
- CHANGE: Faster implementations of encodeSynchsafe32 and decodeSynchsafe32.
- NEW: ID3v1::__construct now supports $filename being an instance of an 
existing Reader.
- CHANGE: ID3v1::__construct now throws exceptions like ID3v2 on failure.
- CHANGE: ID3v2::write now copies the existing MP3 file, before applying the 
updated tag, if the 
destination is different to the existing file.
- NEW: Reader::__construct now supports an instance of a file resource or 
stream resource as 
$filename, allowing for unit testing using in-memory streams.
- TIDY: Made all machine order endian tests use a new function 
Transform::isBigEndian.
- FIX: Fixed incorrect implementations of fromUInt32LE, fromUInt32BE, toInt8, 
fromInt8, 
toString16LE and toString16BE in Transform.
- TIDY: Replaced use of hard-coded integers instead of the endian constants 
defined in 
Transform.
- NEW: Added UInt8 and Float functions to Transform.
- FIX: Made Transform::toString16 and Transform::fromString16 handle BOM's 
correctly.
- NEW: MPEG, MPEG_Frame, MPEG_FrameHeader, MPEG_VBRIHeader and MPEG_XINGHeader 
to 
support basic reading of MPEG files.
- NEW: Twiddling class added for efficient and tidy implementations of bit 
twiddling functions.
- NEW: CRC class added for an efficient look-up table implementation of the 
CRC-16 function 
used by MPEG files.
- NEW: Unit tests for every ID3 frame! Enabled finding of many bugs.
- NEW: Added an ID3v2 Image extraction demo, ID3v2 Read-Write demo and an MPEG 
bitrate/playtime read demo.
- FIX: Fixed bugs in some Transform unit tests.
- NEW: Unit test for CRC::crc16.

Original issue reported on code.google.com by but...@gmail.com on 27 Jul 2008 at 3:48

Attachments:

GoogleCodeExporter commented 9 years ago
- FIX: Removed debug code from MPEG::__contruct which output the number of 
invalid frames.
- NEW: ID3v2::addFrame now accepts a string for $frame which specifies the ID 
of the frame to add. If the frame 
exists, its imported, constructed and used.

Original comment by but...@gmail.com on 29 Jul 2008 at 12:59

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by svollbehr on 29 Jul 2008 at 7:04

GoogleCodeExporter commented 9 years ago
Regarding to some of your fixes to Transform class, I think the PHP 
documentation
also suggests to use the method you have changed the code to. However, 
internally PHP
uses signed integers, which means using unpack("V*") with a big integer would 
result
to buffer overflow and produce -1 or similar. Hence my solution unpacks high 
and low
bytes separately and then uses aritmetic operation that implicitly converts the
resulting integer into float where it can fit the result (either that or using
sprintf, and of which I thought the former was neater).

I'm not sure if this causes some problems with 64-bit machines (I'm developing 
on a
32-bit machine), though, if that is the reason you needed to change the code 
there.
The reason it might work with you is that you are probably working with x64 
where the
integer size is 8 bytes. 
In 64-bit environment the PHP integer size might (or might not) be double that 
of
32-bit environments and would result the expected value with unpack("V*") for 
example.

One could perhaps add a check whether the integer size is 4 bytes and use the 
current
way only then, and otherwise just plain unpack(..). I was not aware of a 
PHP_INT_SIZE
before you introduced it in your patch.

Original comment by svollbehr on 29 Jul 2008 at 9:06

GoogleCodeExporter commented 9 years ago
Hey Sven,

Glad your finding use with my changes. Yes its okay to include my name and 
email in the class documentation.

Regarding the Transform changes. I develop on a 32bit machine as well, so I'm 
in the same boat as you, hence why I 
commented out the 64bit tests if run on a 32bit machine. Why PHP doesn't 
support > machine size integers I don't know. eg. 
Automatically fall back on a Big-Int library if integer goes over the machine 
limit.

The reason I moved away from your floating point idea is precision and so PHP 
treats them as integers, allowing for bit 
masking/manipulation. It also ensures what is read is exactly what is written 
back (important for unit tests).

The signed integer problem isn't really a problem at all. Unless you try and 
perform arithmetic on the values returned. For 
example, an unsigned 0xffffffff is exactly the same in memory as a signed 
-0x7fffffff.  You can always perform a sprintf("%u") 
when displaying the results to the user if the value should be unsigned.

I think the idea of falling back to floating point for 64bit integers is okay 
if the PHP_INT_SIZE < 8, but I think you should try and 
keep them as integers as much as possible (if supported). You will have to 
check all code that uses the 64bit values and ensure 
the operations it performs on them is sane for floating point numbers and that 
you don't lose precision during these operations. 

Another idea could be to fall back on a bit int library for php. I'm sure there 
are a couple out there though I haven't looked into 
this yet.

All the best,

Ryan

Original comment by but...@gmail.com on 30 Jul 2008 at 1:47

GoogleCodeExporter commented 9 years ago
Another thing, instead of removing UTF16LE altogether from ID3_Encoding, it 
could
perhaps be usefull to make it produce UTF16 with LE BOM. That is kind of the 
idea
behind the current code although the current way of being an alias to UTF16 is
errorneus. That way one could write texts in all possible ways: UTF-16 LE/BE w/ 
and
w/o BOM. Does it make sense?

Original comment by svollbehr on 30 Jul 2008 at 10:44

GoogleCodeExporter commented 9 years ago
Hey Sven,

The reason for the removal of the UTF-16LE was so that the code documented the 
spec, which is defined as:

$00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
$01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
           strings in the same frame SHALL have the same byteorder.
           Terminated with $00 00.
$02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
           Terminated with $00 00.
$03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

This means the possible combinations of UTF-16 are: UTF-16BE w/o BOM, UTF-16BE 
w/ BOM and UTF-16LE w/ BOM. When 
writing UTF-16 with a BOM, it wouldn't make sense to write in anything other 
than the current machines architecture as 
you'd get a performance hit due to byte-swapping. Hence why I didn't think it'd 
be worth keeping the LE concept.

To have UTF-16LE you'll have to introduce a new value $04 and translate it to 
UTF-16 ($01) with a forced BOM of LE in every 
frame that using encoding. Possible but only makes sense during writing, not 
reading. (eg. You'll never read a $04).

The decision is totally up to you as its your library, though I'm enjoying 
discussing the options with you. :o)

Thanks and all the best,

Ryan

Original comment by but...@gmail.com on 30 Jul 2008 at 1:17

GoogleCodeExporter commented 9 years ago

Original comment by svollbehr on 30 Jul 2008 at 7:59

GoogleCodeExporter commented 9 years ago
Thank you so much Ryan for all the effort you have taken to make this library 
better!
I have now made the changes to the library code. I appreciate the new test 
classes
and the implementations of some of the remaining frames, not to mention the 
MPEG code
you have provided. Here is a list of changes made to the library:

  o Code samples have now been added as part of the on-line documentation.
  o Transform class now uses PHP_INT_SIZE to use the workarounds only when actually
needed (ie only on 32-bit machines)
  o Unicode (UTF-16) strings are now handled correctly and all tests have been
revised accordingly.
  o All fixed defects have been applied.
  o Almost all new features or changes have been applied too.

Some changes I have left aside as I thought they were inappropriate or couldn't 
see
the reason to add the code. Among these were the following changes.

  o The removal of UTF-16LE, which has been kept and instead made it work as expected
  o Removal of the abstract keyword from AbstractText and AbstractLink as these
classes should be tested with mock objects instead.
  o Addition of all the new constants such as channelType and volumeAdjustment. The
idea was accepted, just that these were added as real constants instead of 
static
strings.
  o The idea of adding a frame with a string name of the frame. One could simply call
$id3-><frame> (eg $id3->tit2->text = ...) to add a new frame. When a frame is
accessed with a shorthand call like that, a new frame will be created and added 
to
the tag should the tag not contain one already. However, there was a bug in this
functionality in the 1.4 version that has now been fixed.

The support for determining the MPEG file play duration has been separated to 
another
issue. I need to think more of what would be the best name for the class and 
what
kind of an interface should it have (the problem covers two standards, and the 
source
you refer to is informal). Determining the play duration of an mpeg video would 
be
usefull too. Docs must be written too before it can be added to the library. I 
just
loved the Twiddling class, I will add it together with the MPEG related files.

Original comment by svollbehr on 9 Aug 2008 at 5:17

GoogleCodeExporter commented 9 years ago

Original comment by svollbehr on 20 Feb 2009 at 9:09

GoogleCodeExporter commented 9 years ago
Issue 9 has been merged into this issue.

Original comment by svollbehr on 20 Feb 2009 at 9:18

GoogleCodeExporter commented 9 years ago
Issue 10 has been merged into this issue.

Original comment by svollbehr on 20 Feb 2009 at 9:18