OlivierGrenoble / mp4v2

Automatically exported from code.google.com/p/mp4v2
Other
0 stars 0 forks source link

MP4Read returns 0 on filename with a umlaut #98

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a file with a non-ascii character (a umlaut for example)
2. Call MP4Read to open the file

What is the expected output? What do you see instead?
The call to MP4Read fails (returns 0 for the handle)

Please use labels and text to provide additional information.
I suspect this is something to do with the way I'm calling it.  I see the 
project is set to build Unicode, so that is the way I was passing the string in 
(I also tried converting to UTF-8 but no joy).  My app is written in VB.Net and 
I'm using a standard string.  Works fine as long as there aren't any non-ascii 
characters.

Any help would be appreciated,

Dan

Original issue reported on code.google.com by danahins...@gmail.com on 1 May 2011 at 5:32

GoogleCodeExporter commented 8 years ago
On windows, I expect the filename argument to be UTF-8 encoded.  If that's not 
working, are there any log messages that help indicate what's going on?

Original comment by dbyr...@gmail.com on 2 May 2011 at 2:32

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Looks like regression is in r453; let me see if I can get a fix in...

to reproduce, use mp4info with any file containing an umlaut on windows.  not 
sure what the state of things is on *nix, but windows definitely does not work.

Original comment by kid...@gmail.com on 2 May 2011 at 8:13

GoogleCodeExporter commented 8 years ago
I see in r410 of the branch where the code was changed:

- replace MultiByteToWideChar with custom code to remove
differences across versions of windows and for improved
error detection and illegal character handling

...could you explain what differences someone may see?  At the moment, I'm in 
favor of reverting to the previous code, which is much more straight-forward 
(and works--I tested it).

Original comment by kid...@gmail.com on 2 May 2011 at 8:18

GoogleCodeExporter commented 8 years ago
I'll write up more explanation soon...I'm a little squeezed for the next couple 
of days.  As I look at this particular issue, I think the problem may be in the 
way mp4info processes its command line arguments.  I'll look more closely 
fairly soon.  I'd really like to keep the code that's there, but I'll justify 
why in more detail, also soon.

Original comment by dbyr...@gmail.com on 2 May 2011 at 8:43

GoogleCodeExporter commented 8 years ago
From the Remarks section of:
http://msdn.microsoft.com/en-us/library/dd319072%28v=vs.85%29.aspx
--
Starting with Windows Vista, this function fully conforms with the Unicode 4.1 
specification for UTF-8 and UTF-16. The function used on earlier operating 
systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. 
Code written in earlier versions of Windows that rely on this behavior to 
encode random non-text binary data might run into problems. However, code that 
uses this function on valid UTF-8 strings will behave the same way as on 
earlier Windows operating systems.
--
In other words, on pre-Vista error handling of malformaed surrogate pairs isn't 
right.  And since we didn't set MB_ERR_INVALID_CHARS, we don't learn about 
other kinds of invalid byte sequences.  Reverting to the code that uses 
MultiByteToWideChar, but using MB_ERR_INVALID_CHARS is tempting, but even that 
has different behavior on different versions of windows.  From the notes about 
that flag:
--
Starting with Windows Vista, the function does not drop illegal code points if 
the application does not set this flag.

    Windows 2000 with SP4 and later, Windows XP:  Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
--

So, the point of the code I wrote is to detect and log invalid byte sequences 
in a consistent way across all versions of Windows.

Original comment by dbyr...@gmail.com on 2 May 2011 at 10:16

GoogleCodeExporter commented 8 years ago
But since neither the utility code in the project (mp4file, mp4info, ...) 
doesn't work with the new code, nor does the code I wrote in VB.Net (which used 
to work) there's been some change in the calling convention.  I can easily 
debug into the MP4VB2 dll and see that the string that I'm passing in is single 
byte for all characters (see the attached image).

However, the a umlaut ([13]) shows as -28, since it's a signed character.  
Perhaps this is the problem?  Seems to me UTF-8 shouldn't interpret the signed 
bit.  Should these be declared as unsigned char, or is it something else.

I'll be glad to help debug this if someone can give me an idea of what to look 
for.

Dan

Original comment by danahins...@gmail.com on 2 May 2011 at 10:43

Attachments:

GoogleCodeExporter commented 8 years ago
I'm not sure what version of File_win32.cpp you're using, but it's different 
than mine.  r453 calls CreateFileW in StandardFileProvider::open like this:

 _handle = CreateFileW( filename, access, share, NULL, crdisp, flags, NULL );

but the screenshot you attached shows:

_handle = CreateFileW( win32::Utf8ToWideChar( name ), access, share, NULL, 
crdisp, NULL );

What version of File_win32.cpp have you got?

Original comment by dbyr...@gmail.com on 3 May 2011 at 2:30

GoogleCodeExporter commented 8 years ago
I guess I hadn't sync'ed in a while.  So I just sync'ed up and now when I build 
I get unresolved externals for mp4v2::impl::log.* (errorf, setVerbosity, ...)  
and mp4v2::platform::win32::Utf8ToWideChar.  I see the definition of 
Utf8ToWideChar in platform_win32_impl.h, but I can't see where log is defined.  
Do I need some new define or something to get this to link?

Original comment by danahins...@gmail.com on 3 May 2011 at 4:49

GoogleCodeExporter commented 8 years ago
I don't see any mention of Utf8ToWideChar in platform_win32_impl.h 
(http://code.google.com/p/mp4v2/source/browse/trunk/libplatform/platform_win32_i
mpl.h).  Are there other files to sync as well?

Original comment by dbyr...@gmail.com on 3 May 2011 at 1:13

GoogleCodeExporter commented 8 years ago
I'm totally confused now.  The file I had after sync'ing didn't match the one 
at the link.  I see there's a subdir p in the link above.  What URL should I be 
using to sync?  I was using http://mp4v2.googlecode.com/svn/trunk, is that not 
the correct one?  I've attached the platform_win32_impl.h I got when I sync'ed.

Confused,

Dan

Original comment by danahins...@gmail.com on 3 May 2011 at 4:40

Attachments:

GoogleCodeExporter commented 8 years ago
I think the extra p in the URL is for browsing.  What does "svn info" say when 
you run it in the root of your local repo?

For what it's worth, I tested mp4info with a file with an umlaut in the name 
(U+00E4) and it can't read the file.  I invoked mp4info with CreateProcess so 
I'm confident I passed a properly UTF-8 encoded version of U+00E4 (which is 
0xC3A4) as part of the filename.  When I looked at argv inside mp4info, I see 
something else (0xE4, which in my case is followed by 0x2E -- the . separating 
name from extension).  So, when E4 gets intepreted as the first byte of a 3 
byte UTF-8 encoding, 2E should be a continuation character but isn't since only 
[0x80,0xBF] is valid.

I didn't revert to an older version of the tree to see what happens there, but 
the solution as I see it is to use (on windows only) the slightly painful 
combination of GetCommandLineW, CommandLineToArgvW, and then convert each 
argument from UTF-16 to UTF-8.  When I did that, the proper UTF-8 makes it 
lower down and the file is opened fine.

My plan is to figure out a good way/place to put the code for retrieving the 
the UTF-8 encoding of the command line, and then call it from mp4info and the 
other tools.  I'd love to hear confirmation from someone else that this is a 
good idea, and possibly even suggestions for how to structure the code.

Original comment by dbyr...@gmail.com on 3 May 2011 at 7:09

GoogleCodeExporter commented 8 years ago
First let me say that I'm a novice about SVN.  I have only used it with MP4V2, 
and then only to sync the changes to the source tree.  I use RapidSVN and the 
result from the info display is below.

FWIW, I don't really care about the utilities, and fixing them isn't the main 
problem.  I suspect most people use the library, not the utilities.  I need to 
figure out what the correct way is to call the program from my VB.Net code.  
I'll see if converting from UTF-16 to UTF-8 makes a difference.

But I'm really confused about the correct way to get the files from SVN.  
Before when I sync'ed, I got all the correct files, so I'm confused as to what 
has changed such that my files now don't match what's in the tree.  Any hints 
on what I should try here would be most appreciated.

The results from RapidSVN info:

Name: 
URL: http://mp4v2.googlecode.com/svn/trunk/util
Repository: http://mp4v2.googlecode.com/svn
Repository UUID: 6e6572fa-98a6-11dd-ad9f-f77439c74b79
Revision: 461
Node Kind: directory
Last Changed Author: kidjan
Last Changed Rev: 461
Last Changed Date: 2011-04-26 19:42 GMT
Text Last Updated: 1970-01-01 00:00 GMT
Properties Last Updated: 1970-01-01 00:00 GMT
Checksum: <None>

Original comment by danahins...@gmail.com on 4 May 2011 at 12:03

GoogleCodeExporter commented 8 years ago
If it helps, after finally getting the latest rev to build, I get an set of log 
statements when trying to open a file with an a umlaut:

mp4v2::platform::win32::Utf8ToFilename::Utf8DecodeChar: 0x74 is not a valid 
continuation character in a UTF-8 encoding
mp4v2::platform::io::StandardFileProvider::open: 
CreateFileW(\\?\C:\Downloads\ätest.m4v) failed (2)
mp4v2::impl::MP4File::Open: open(C:\Downloads\ätest.m4v) failed 
(..\..\src\mp4file.cpp,395)

Original comment by danahins...@gmail.com on 4 May 2011 at 10:58

GoogleCodeExporter commented 8 years ago
This is basically the same error I get when I pass a UTF-8 encoded string to 
mp4info.  I'd still like some feedback on my proposal to fix it 
(http://code.google.com/p/mp4v2/issues/detail?id=98#c12).

-DB

Original comment by dbyr...@gmail.com on 5 May 2011 at 2:20

GoogleCodeExporter commented 8 years ago
OK, I've figured out how to make my code work, but I'm not sure what it means 
with respect to the correct solution.  The problem appears to be that it's not 
easy (or at least I don't understand how to) turn a unicode string into a utf-8 
string.  When I look at my string, it has 228 for the a umlaut, which is fine 
for unicode.  But if I try to encode that string into utf-8 using 
encoding.utf8.getbytes and encoding.utf8.getstring, it turns the a umlaut into 
a question mark.  What I did was use encoding.utf8.getbytes, then built the 
char array using asc(byte(i)) then turning that into a string.  Using this, it 
works!

Maybe this is similar to what you were talking about dbyron?  Now that I 
understand how to make this work, I'll do some more research about the 
"correct" way to accomplish this in .Net.

dbyron, you seem to be a windows guy, I'd like to talk to you about a 
performance issue on windows, that I have a fix for.  Should I just open up a 
new issue (which I did some time back, I think before you got involved) or 
would you rather discuss it via email?

Dan

Original comment by danahins...@gmail.com on 5 May 2011 at 5:16

GoogleCodeExporter commented 8 years ago
I don't know any VB or .Net so I'm not sure I can be much help there.  I'm not 
sure talking about something as a "unicode string" makes sense though.  Any 
string is in a particular encoding, but as I understand it, there is no 
encoding called "unicode."  Natively windows uses (at least at the Win32 API 
level), UTF-16, so maybe that's what you're starting with?

228 is 0xE4 and a with an umlaut is U+00E4 so it makes sense up to there.

When you say the umlaut turns into a question mark, what do you mean?  What are 
the actual values of the bytes that come back?  It sounds like you've got a 
solution, and it may be the correct one.

Sure, send me an email with other questions.  I'll see what I can do.

Original comment by dbyr...@gmail.com on 5 May 2011 at 5:29

GoogleCodeExporter commented 8 years ago
I found the more correct solution, I was using type string in calling 
MP4Read/Modify, so I always had to turn it back into a string, which turned it 
back into UTF-16, which didn't work.  The solution was to change the calling 
signature to byte() for the filename and use encoding.utf8.getbytes.  It now 
works like a charm.

I'll send you (dbyron) and email later today or tomorrow to talk about the 
other issue, but I think (from my perspective at least) you can close this.  Of 
course you'll still need to fix the utilities, but it sounds like you've got a 
solution for that.

Dan

Original comment by danahins...@gmail.com on 5 May 2011 at 5:43

GoogleCodeExporter commented 8 years ago
dbyron, the link to your userid is broken.  And what's your email address?

Dan

Original comment by danahins...@gmail.com on 5 May 2011 at 5:45

GoogleCodeExporter commented 8 years ago
http://code.google.com/u/dbyron0/ works for me, but only when I'm logged in.  
I'd rather not publish my email address here to protect from spam.  That link 
has a captcha to at least make it a little harder to get to.

If that doesn't work, start another issue.

Thanks.

-DB

Original comment by dbyr...@gmail.com on 5 May 2011 at 5:52

GoogleCodeExporter commented 8 years ago
I'm logged in and when I click on that link I get a 404.  I can click on mine 
or kidjan and they work.

You can just email me at temp@danhinsley.com and I'll get back to you with my 
regular email address.

Dan

Original comment by danahins...@gmail.com on 5 May 2011 at 6:11

GoogleCodeExporter commented 8 years ago
I got your email, but when I tried to reply to it, it wasn't valid.  Can you 
include your email address in the body of the text?

Original comment by danahins...@gmail.com on 5 May 2011 at 7:07

GoogleCodeExporter commented 8 years ago
Ah sorry for the false alert, dbyron.  I forgot that the command line utils 
don't handle unicode correctly; I tested within our application, that does call 
the API directly, and all is good.  Thanks for following through though.

Dana, do you have C# interop definitions defined?  if so, would you be willing 
to donate those to the library?  and yes, when passing strings as utf8 in .NET, 
you have to take a few extra steps, but it sounds like you figured that out ;)  

Original comment by kid...@gmail.com on 14 May 2011 at 11:46

GoogleCodeExporter commented 8 years ago
I use VB.Net, not C#.  I've included the API Definitions I use in VB.  Due to 
the way that arrays of strings and structures with strings are passed, I had to 
create a C++ DLL that sits between some of the API (like MP4GetItems, reading 
and setting chapters, ...) and my VB App.  I could provide that as well if 
people wanted to access the library from VB.

Here are the definitions of the API's I call directly from my VB App.

    Friend Declare Function MP4Read Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer) As IntPtr
    Friend Declare Function MP4Modify Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer, ByVal flags As Integer) As IntPtr
    Friend Declare Sub MP4TagsSetName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetAlbum Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetReleaseDate Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetGenre Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetLongDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetTVNetwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetTVEpisodeID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetTVSeason Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
    Friend Declare Sub MP4TagsSetTVEpisode Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
    Friend Declare Sub MP4TagsSetSortName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetSortArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetSortTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetCopyright Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetEncodingTool Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetKeywords Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetCategory Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetGrouping Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetComments Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
    Friend Declare Sub MP4TagsSetContentRating Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetPodcast Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetGapless Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetMediaType Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetCompilation Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetTrack Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
    Friend Declare Sub MP4TagsSetDisk Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
    Friend Declare Sub MP4TagsSetHDVideo Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
    Friend Declare Sub MP4TagsSetContentID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
    Friend Declare Sub MP4TagsRemoveArtwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal flags As Integer)
    Friend Declare Sub MP4TagsStore Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal file As IntPtr)
    Friend Declare Sub MP4Close Lib "libMP4V2.dll" (ByVal file As IntPtr)

Original comment by danahins...@gmail.com on 15 May 2011 at 4:23

GoogleCodeExporter commented 8 years ago
Going to close this issue now--I think the remaining "need unicode support in 
command line tools" aspects of it are probably better represented in issue 108.

Original comment by kid...@gmail.com on 26 Jun 2011 at 6:58

GoogleCodeExporter commented 8 years ago
Sounds good to me.  I fixed my app and I'm not sure the command line tools are 
all that important.

Original comment by danahins...@gmail.com on 26 Jun 2011 at 3:51

GoogleCodeExporter commented 8 years ago
@dana,

I am interested in your API definitions to use libmp4v2 through 
vb.net...specifically how to save artwork and save information for the "rating" 
tag (PG, PG-13 etc).

I have made a post on the discussion group regarding this, but it's moderated 
so it might appear a little later.

Thanks

Original comment by kari...@gmail.com on 18 Jan 2012 at 10:34

GoogleCodeExporter commented 8 years ago
I've attached two files, one with the routines I use and one with the declares. 
 Let me know if you have any questions.

Dan

Original comment by danahins...@gmail.com on 18 Jan 2012 at 10:49

Attachments:

GoogleCodeExporter commented 8 years ago
@dana,

Thanks for the attachments, but I'm still a little confused.

I see you have declarations for "MP4V2VBWrapper.dll", but what/where is this 
dll?

I guess the functions i'm looking for are:

    Friend Declare Sub VBMP4SetRating Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
    Friend Declare Sub VBMP4SetMOVI Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
    Friend Declare Sub VBMP4SetKind Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
    Friend Declare Sub VBMP4SetCoverArt Lib "MP4V2VBWrapper.dll" (ByVal buff() As Byte, ByVal len As Integer)

Right now in my code I only have code to write the coverArt (using .net 4.0 so 
I have to use CallingConvention.Cdecl or it throws an error):

    <DllImport("libMP4V2.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.Cdecl)>
    Friend Shared Sub MP4TagsAddArtwork(ByVal tags As IntPtr, ByVal art As CoverArtBox)
    End Sub

And then using the following to try to write the art, but it doesn't seem to 
work, I see the thumbnail in windows explorer change but to a generic "media" 
type icon in windows, so i'm guessing what i'm saving in the data is not 
correct.

            If Images.Count > 0 Then
                Dim art As New libmp4v2.CoverArtBox
                Using ms As IO.MemoryStream = New IO.MemoryStream
                    Images(0).Image.Save(ms, Images(0).Image.RawFormat)
                    art.size = ms.Length
                    art.data = ms.ToArray
                    ms.Close()
                End Using
                art.type = libmp4v2.MP4TagArtworkType.MP4_ART_UNDEFINED
                libmp4v2.MP4TagsAddArtwork(tags, art)
            End If

And how would I go about writing the rating?

FYI, I only want to write tags/atoms not read.

Thanks for your help.

Original comment by kari...@gmail.com on 19 Jan 2012 at 4:25

GoogleCodeExporter commented 8 years ago
How about you email me directly.  I can either give you the DLL or give you the 
DLL and the sources, whichever you prefer.

Dn

Original comment by danahins...@gmail.com on 19 Jan 2012 at 5:21

GoogleCodeExporter commented 8 years ago
Dan, Are you willing to share the DLL and VB files with me as well?

Original comment by rebelzo...@gmail.com on 12 May 2013 at 4:49