Closed gtiwari333 closed 8 months ago
This PR will fix the extraction of picture control name that's coming up with trailing 0s.
Thanks for following up here. I intentionally changed your last PR before merging to avoid the approach here though. Let's see if there's a different approach that'll work for you.
When pulling strings from bytes, we generally store StringValue
rather than String
. This allows the consumer to reinterpret the underlying data in different encodings. If we convert to string, that data is lost.
The problem you're seeing is that StringValue
is not trimming trailing null bytes. I wonder if there's a better way to deal with that.
I ported the fix to .NET as well here: https://github.com/drewnoakes/metadata-extractor-dotnet/pull/349
And CI is fixed again. Sorry about that.
Instead of adding
getNullTerminatedStringAndSkipToNextPosition
, we can callgetNullTerminatedStringValue
There's a problem with getNullTerminatedStringValue
. After reading the requested number of bytes from input, it doesn't move the cursor to the end. My test testGetNullTerminatedStringCursorPositionTest
explains what's wrong with getNullTerminatedStringValue
method. Only way to get around it is to check the size of text returned and manually skip
the remaining bytes.
For input AB\0\0\0CD\0ABCDEF
If you ask for getNullTerminatedString(5)
it will return AB
but keep the cursor position at index 3.
The new method getNullTerminatedStringAndSkipToNextPosition
reads the AB
and move the cursor at C
.
If you compare testGetNullTerminatedStringCursorPositionTest
and testGetNullTerminatedStringAndSkipToNextPosition
you will see the improvement the new method makes.
Did few more testing and found an easy
way to deal with it. Let me know what would you prefer?
The getStringValue
reads the all 20 bytes and moves the cursor to the end. If we trim the value using String.trim(), it will remove the trailing null characters.
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString())
Change this to:
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString().trim())
I ended up fixing this in #647 with an approach that mirrors what we did in the .NET version, so that they're as close as possible (which helps porting other changes in future). It also avoids adding toString
calls to reduce allocations and allow consumers to reinterpret the StringValue
in other encodings if needed.
Oops. Thanks!
@drewnoakes I deleted my comment because I assumed you would edit yours - but ended up making it make no sense at all 😊
A change from https://github.com/drewnoakes/metadata-extractor/pull/634/files#diff-7df436f0258b5156fd15792db94b4cab6119180ad7d605533ffb53213f048ba2R400 got missed in earlier merge.
https://github.com/drewnoakes/metadata-extractor/pull/635