Open GoogleCodeExporter opened 9 years ago
Hey tdebaets,
Thanks for the patch--looks very nice ;) And I'm sure other people could
definitely use this. I'll review and patch this weekend, assuming all looks
good.
Original comment by kid...@gmail.com
on 5 Aug 2011 at 6:49
Great, thanks :) I see that the patch hasn't been applied yet though - did you
find any issues?
Original comment by tdebaets
on 8 Aug 2011 at 2:58
No, unfortunately I didn't have any free time this weekend, so...I didn't get
around to it. I browsed the patch briefly. One thing I'd like to see is
documentation for any public APIs, just so it's clear how someone would use it.
Original comment by kid...@gmail.com
on 8 Aug 2011 at 7:11
OK, no problem. I will write some documentation for the new public APIs and
will submit a new patch somewhere this week.
Original comment by tdebaets
on 8 Aug 2011 at 7:41
So, sorry for the delay. Been looking at this yesterday and today. There was
no VS solution updates which broke the VS build because it modified existing
files. Also you didn't add the new command line project to the VS solution.
I'll fix both of those when I check this in.
Also, why is this stuff commented out in the public interface? Do you intend
to add these at some future point, or is this stuff that can be zapped for now?
//MP4V2_EXPORT const MP4XtraTags* MP4XtraTagsAlloc();
//MP4V2_EXPORT void MP4XtraTagsFetch( const MP4XtraTags* tags, MP4FileHandle
hFile );
//MP4V2_EXPORT void MP4XtraTagsStore( const MP4XtraTags* tags, MP4FileHandle
hFile );
//MP4V2_EXPORT void MP4XtraTagsFree ( const MP4XtraTags* tags );
Also, help me understand how this fits in with the existing itmf stuff; do they
overlap at all? Do they compliment one another?
Thanks for the patch--it'll get in some time today.
Original comment by kid...@gmail.com
on 12 Aug 2011 at 6:24
So, initially I checked in in r484, but on closer inspection I decided to
revert the changes in r485. There's some stuff that needs to be fleshed out
before this can go in.
There are two major issues:
1. First, it's not clear to me what the string format is for metadata items; my
guess is UTF8, but this should be tested and made clear in the API. So it
needs to be tested with some metadata name containing non-ASCII characters.
2. The second, much bigger issue is how strings are persisted. I see the
application is serializing wchar_t, which is a major problem for cross platform
support. Windows treats wchar_t as UTF-16 (so two bytes per character), while
Linux and OSX treat wchar_t as UTF-32 (4 bytes per char). So working on the
same platform would be fine, but if I attempted to write string data on Linux
and then read it on windows (or vice-versa), it would fail because the
character encoding differs. The iTmf stuff avoids this by using UTF8
everywhere.
Lastly, this is an undocumented, reverse engineered MSFT atom...and I have to
wonder what sort of example we'd be setting by supporting such an atom in the
first place. I'm still open to accepting the patch, assuming #2 is resolved
(and #1 clarified) and you can successfully use the code across platforms. And
let me know if I'm wrong about any of this (certainly wouldn't be the first
time). I do really appreciate the patch, but I can't accept code that isn't
going to work cross platform.
I'll leave this open until you respond.
Original comment by kid...@gmail.com
on 12 Aug 2011 at 7:52
> Also, why is this stuff commented out in the public interface? Do you
> intend to add these at some future point, or is this stuff that can be
> zapped for now?
These were already in ErikSka's modified version and provide (unfinished)
support for retrieving all metadata items in the Xtra atom. Not really needed
for now because one will usually know the exact name of the metadata item he
wants to get, so I'll leave these out.
> Also, help me understand how this fits in with the existing itmf stuff; do
> they overlap at all? Do they compliment one another?
As this is a totally new way for storing metadata, it's completely separate
from the iTMF stuff - there's no overlap.
> 1. First, it's not clear to me what the string format is for metadata
> items; my guess is UTF8, but this should be tested and made clear in the
> API. So it needs to be tested with some metadata name containing non-ASCII
> characters.
Do you mean the name of the metadata items (e.g. WM/EncodingTime,
WM/Publisher)? This is unfortunately impossible to test, because WMP only
writes its predefined attributes to file and there isn't any WMP attribute
whose name contains non-ASCII characters. I would just say in the API that
these names should be plain ASCII strings.
> 2. The second, much bigger issue is how strings are persisted. I see the
> application is serializing wchar_t, which is a major problem for cross
> platform support. Windows treats wchar_t as UTF-16 (so two bytes per
> character), while Linux and OSX treat wchar_t as UTF-32 (4 bytes per
> char). So working on the same platform would be fine, but if I attempted
> to write string data on Linux and then read it on windows (or vice-versa),
> it would fail because the character encoding differs. The iTmf stuff
> avoids this by using UTF8 everywhere.
I see what you mean. Using UTF8 just like the iTMF API is probably the way to
go. Because the metadata value strings are stored as UTF16 in the Xtra atom, a
conversion would then be required while reading/writing. I searched for some
cross-platform code that performs this conversion and found this:
https://trac.transmissionbt.com/browser/trunk/libtransmission/ConvertUTF.h . Is
it OK if I include this in MP4v2? It also seems to be used by a lot of other
open source projects.
> Lastly, this is an undocumented, reverse engineered MSFT atom...and I have
> to wonder what sort of example we'd be setting by supporting such an atom
> in the first place.
Unlike the other issues, there's nothing that I can do about that - I can't
force MS to document the atom, sadly :p But I still think that this patch would
be a nice addition to MP4v2. Extra support is always good and it's not *that*
much code, so the increase in library size is relatively small.
Original comment by tdebaets
on 13 Aug 2011 at 2:44
As it stands, mp4v2 already has some mild license problems (see issue 81), so
I'm not sure how I feel about adding yet another license into the fray (unisys
license). Is there no public domain option for doing UTF* conversion?
Original comment by kid...@gmail.com
on 16 Aug 2011 at 8:10
Is Utf8ToFilename in libplatform/platform_win32_impl.h not sufficient? I
suppose not the entire class is relevant if we're not talking about filenames,
but the ConvertToUTF16 method might be. I'm happy to contribute UTF16 --> UTF8
conversion routines as well.
Original comment by dbyr...@gmail.com
on 16 Aug 2011 at 8:22
@kidjan: I might be wrong, but aren't ConvertUTF.h and ConvertUTF.c already
public domain? I certainly don't see/recognize any license in their notice at
the beginning. I also found this but the usage seems more awkward:
http://utfcpp.sourceforge.net .
@dbyron: That ConvertToUTF16 method looks alright. Note that the conversion is
also necessary on Linux and OSX, so that means that the method would have to be
moved outside of the Win32-specific platform code.
Original comment by tdebaets
on 16 Aug 2011 at 9:29
They use a unicode license:
8 /*
9 * Copyright 2001-2004 Unicode, Inc.
10 *
11 * Disclaimer
12 *
13 * This source code is provided as is by Unicode, Inc. No claims are
14 * made as to fitness for any particular purpose. No warranties of any
15 * kind are expressed or implied. The recipient agrees to determine
16 * applicability of information provided. If this file has been
17 * purchased on magnetic or optical media from Unicode, Inc., the
18 * sole remedy for any claim will be exchange of defective media
19 * within 90 days of receipt.
20 *
21 * Limitations on Rights to Redistribute This Code
22 *
23 * Unicode, Inc. hereby grants the right to freely use the information
24 * supplied in this file in the creation of products supporting the
25 * Unicode Standard, and to make copies of this file in any form
26 * for internal or external distribution as long as this notice
27 * remains attached.
28 */
...definitely a very permissive license, but not public domain. And another
licenses we'd have to adhere to, and end-users would have to comply with when
distributing MP4v2. That sourceforge project is also not in the public domain;
it uses some unnamed derivative license.
@dbyron0,
Big issue is running on *nix and OSX, so you'd need conversion routines. This
actually is somewhat of a problem, given:
wchar_t *
Utf8ToFilename::ConvertToUTF16 ( const string &utf8string )
{
...
ASSERT(sizeof(wchar_t) == 2);
It's true on windows, but not true on Linux or OSX.
Original comment by kid...@gmail.com
on 23 Aug 2011 at 4:44
@kidjan: I hear you about the license problems. I have found other code that's
public domain and that doesn't look too bad:
http://altdevblogaday.com/2011/03/30/putting-the-you-back-into-unicode . The
only problem is the complete lack of docs, but if I understand correctly,
conversion first requires calling one of the "Measure" functions to determine
the size of the destination buffer, allocating this buffer, and then calling
the right "Transcode" function. Does this look OK for including into MP4v2?
Original comment by tdebaets
on 24 Aug 2011 at 2:42
You're right that ConvertToUTF16 only works as it is on windows. Converting
from UTF8 to UTF16 just before saving seems reasonable. Making a version that
uses an array of four bytes instead of an array of two wchar_t's seems a simple
enough change...though maybe using an array of four bytes makes more sense on
all platforms at this level. The filename stuff could stick with wchar_t on
Windows and know how to build those from (up to) four bytes representing UTF-16.
Am I right that all of this Xtra stuff uses (or can use) UTF-8 everywhere in
memory and it's only on disk that it needs UTF-16?
Original comment by dbyr...@gmail.com
on 26 Aug 2011 at 2:53
You're right that ConvertToUTF16 only works as it is on windows. Converting
from UTF8 to UTF16 just before saving seems reasonable. Making a version that
uses an array of four bytes instead of an array of two wchar_t's seems a simple
enough change...though maybe using an array of four bytes makes more sense on
all platforms at this level. The filename stuff could stick with wchar_t on
Windows and know how to build those from (up to) four bytes representing UTF-16.
Am I right that all of this Xtra stuff uses (or can use) UTF-8 everywhere in
memory and it's only on disk that it needs UTF-16?
Original comment by dbyr...@gmail.com
on 26 Aug 2011 at 2:53
> Am I right that all of this Xtra stuff uses (or can use) UTF-8 everywhere
> in memory and it's only on disk that it needs UTF-16?
Correct. It currently doesn't use UTF-8 yet, but that's the plan.
Original comment by tdebaets
on 26 Aug 2011 at 6:08
dbyron,
I'm no unicode guru, but I'm not sure using an array of two wchar_t would work;
on OSX and Linux, it's UTF-32, not UTF-16. UTF-16 is variable length; UTF-32
is not. It's two completely different character encodings, so basically the
code would have to be able to handle UTF-32.
Original comment by kid...@gmail.com
on 6 Sep 2011 at 1:54
I don't think UTF-32 would ever come into play. I figure we've got UTF-8 in
memory and only when we need to use persistent storage to we ever convert
to/from UTF-16. That conversion may not even need to be platform specific if
we keep wchar_t out of it and use an array of four bytes to hold the UTF-16
encoding instead.
I should add that I'm super busy at the moment and don't have time to spend on
this likely til October...
Original comment by da...@onmenetwork.com
on 6 Sep 2011 at 2:15
[deleted comment]
> I should add that I'm super busy at the moment and don't have time to spend
> on this likely til October...
Not sure if the UTF-8 to/from UTF-16 conversions really need to be written from
scratch. What about the public domain code at
http://altdevblogaday.com/2011/03/30/putting-the-you-back-into-unicode ?
Original comment by tdebaets
on 6 Sep 2011 at 7:11
It'd be great to get it from elsewhere, though it might be a little strange to
have two different hunks of code for encoding conversion in mp4v2.
More of an issue is that the downlink link I found there
(http://studiotekne.com/downloads/Unicode.zip) has the test routines, but I
couldn't find the conversion routines themselves. I could easily be looking in
the wrong place.
Plus also, I'd like to make sure whatever code we use handles all the weird
invalid cases here: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
Original comment by da...@onmenetwork.com
on 6 Sep 2011 at 7:25
In that archive, the conversion routines are in
Unicode\tests\Unicode.Test\Unicode.h.
But no problem if you really want to write it from scratch, there's no hurry :)
Original comment by tdebaets
on 6 Sep 2011 at 7:39
Are there any updates on newly written conversion routines or on third-party
code that could be used?
Original comment by tdebaets
on 31 Oct 2011 at 7:18
I have removed the attached patch from my original post because it still
contained a bug. When writing a string tag to the Xtra atom, the code
incorrectly didn't include the null terminator. The result of this was a
corrupted Xtra atom. I hope that no one was already relying on my patch but I
apologize if this was the case.
A fixed version of my patch is attached to this comment. Other changes are:
- Removed superfluous commented code.
- Added some error checking to tag writing functions (result type changed from
void to bool).
- Included changes to the Visual Studio solution and projects.
Some comments/pointers on which public domain Unicode conversion routines to
use would still be welcome.
Original comment by tdebaets
on 9 Sep 2012 at 4:43
i meant to get back to this...sorry. been stuck in crazy work project for a
few months. i do want to get this code in because I've had several requests
for Xtra atom support, so when I get a chance I'll look into what to do about
the unicode issues.
Original comment by kid...@gmail.com
on 1 Oct 2012 at 2:38
Thought I'd check in and see if anyone had gotten a chance to look at this.
Users continue to want this functionality as it appears WMP and XBox 360 use it.
Original comment by danahins...@gmail.com
on 19 Dec 2012 at 5:16
This might be sort of a sloppy solution, but would there be any way to have
this patch be windows only for the time being? I'd like to commit this,
because there's definitely a desire to have this feature, but without dealing
with the character encoding issues it's pretty hard to handle.
Might also have the interface be marked with "beta" or "subject to change" if
possible, since if we fixed it later, that would entail dropping wchar_t from
the public interfaces to move to UTF8 instead?
Original comment by kid...@gmail.com
on 28 Dec 2012 at 5:02
From my perspective, Windows only would be great.
Original comment by danahins...@gmail.com
on 28 Dec 2012 at 5:08
I'm fine with that (temporary) solution as well.
Original comment by tdebaets
on 28 Dec 2012 at 5:13
Attached to this comment is a new version of the patch. As requested by Dan,
all added code now only gets compiled on Windows. I have put the Xtra tags
API/implementation behind ifdefs for MP4V2_XTRA_TAGS, and in platform.h, I only
define MP4V2_XTRA_TAGS if _WIN32 is defined.
Also, GNUmakefile.am doesn't contain the new source files anymore since I
assume that the makefile won't be used under Windows.
The other small change is that the mp4xtratags util now also checks for a
WM/SharedUserRating tag in the Xtra atom. WMP uses this tag for saving star
ratings to MPEG-4 files.
Original comment by tdebaets
on 14 Dec 2013 at 9:17
Attachments:
Original issue reported on code.google.com by
tdebaets
on 5 Aug 2011 at 6:16Attachments: