Closed iamcarbon closed 5 months ago
@drewnoakes Ready for feedback.
Did my comments come through? GitHub mobile issues again? Will be at a desktop shortly.
I just ran your PR branch (merged with main) across the regression test suite and hit this assert:
---------------------------
Assertion Failed: Abort=Quit, Retry=Debug, Ignore=Continue
---------------------------
attempted to read past end of data
at MetadataExtractor.IO.BufferReader.GetByte() in D:\repos\metadata-extractor-dotnet\MetadataExtractor\IO\BufferReader.cs:line 17
at MetadataExtractor.Formats.Jpeg.JpegReader.Extract(JpegSegment segment) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\Formats\Jpeg\JpegReader.cs:line 52
at MetadataExtractor.Formats.Jpeg.JpegReader.<ReadJpegSegments>b__3_0(JpegSegment segment) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\Formats\Jpeg\JpegReader.cs:line 21
at System.Linq.Enumerable.WhereSelectListIterator`2.MoveNext()
at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
at MetadataExtractor.Formats.Jpeg.JpegMetadataReader.ProcessJpegSegments(IEnumerable`1 readers, ICollection`1 segments) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\Formats\Jpeg\JpegMetadataReader.cs:line 88
at MetadataExtractor.Formats.Jpeg.JpegMetadataReader.Process(Stream stream, ICollection`1 readers) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\Formats\Jpeg\JpegMetadataReader.cs:line 77
at MetadataExtractor.Formats.Jpeg.JpegMetadataReader.ReadMetadata(Stream stream, ICollection`1 readers) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\Formats\Jpeg\JpegMetadataReader.cs:line 47
at MetadataExtractor.ImageMetadataReader.ReadMetadata(Stream stream) in D:\repos\metadata-extractor-dotnet\MetadataExtractor\ImageMetadataReader.cs:line 81
at MetadataExtractor.MediaLibraryProcessor.DotNetRunner.ProcessDirectory(String path, IReadOnlyList`1 handlers, String relativePath, TextWriter log) in D:\repos\metadata-extractor-images\src\dotnet\MetadataExtractor.MediaLibraryProcessor\DotNet\DotNetRunner.cs:line 79
at MetadataExtractor.MediaLibraryProcessor.DotNetRunner.ProcessDirectory(String path, IReadOnlyList`1 handlers, String relativePath, TextWriter log) in D:\repos\metadata-extractor-images\src\dotnet\MetadataExtractor.MediaLibraryProcessor\DotNet\DotNetRunner.cs:line 62
at MetadataExtractor.MediaLibraryProcessor.DotNetRunner.ProcessDirectory(String path, IReadOnlyList`1 handlers, String relativePath, TextWriter log) in D:\repos\metadata-extractor-images\src\dotnet\MetadataExtractor.MediaLibraryProcessor\DotNet\DotNetRunner.cs:line 62
at MetadataExtractor.MediaLibraryProcessor.DotNetRunner.<RunAsync>d__0.MoveNext() in D:\repos\metadata-extractor-images\src\dotnet\MetadataExtractor.MediaLibraryProcessor\DotNet\DotNetRunner.cs:line 38
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
---------------------------
Abort Retry Ignore
---------------------------
The failing file is metadata-extractor-images\jpg\ImageTestSuite\c8bc97335529d069a753c67475b8c82c.jpg
. It's failing here:
Looking at that code, we don't validate that the segment span is long enough for the read operations we want to perform. We need to either add that validation, or reintroduce the precondition and throw an IOException
. I think the former is reasonable.
Another hit in metadata-extractor-images\png\ImageTestSuite\e76546768d4a8f2f4c39339345c7614c.png
here: https://github.com/iamcarbon/metadata-extractor-dotnet/blob/db65fa5c81af218adfe9e6b2645860ff2b58f76d/MetadataExtractor/Formats/Png/PngMetadataReader.cs#L322
I was able to fix all asserts in the regression suite with the following diff:
diff --git a/MetadataExtractor/Formats/Jpeg/JpegReader.cs b/MetadataExtractor/Formats/Jpeg/JpegReader.cs
index 78499c0a..60a02fd2 100644
--- a/MetadataExtractor/Formats/Jpeg/JpegReader.cs
+++ b/MetadataExtractor/Formats/Jpeg/JpegReader.cs
@@ -31,6 +31,14 @@ namespace MetadataExtractor.Formats.Jpeg
var reader = new BufferReader(segment.Span, isBigEndian: true);
+ const int JpegHeaderSize = 1 + 2 + 2 + 1;
+
+ if (segment.Span.Length < JpegHeaderSize)
+ {
+ directory.AddError("Insufficient bytes for JPEG segment header.");
+ return directory;
+ }
+
try
{
directory.Set(JpegDirectory.TagDataPrecision, reader.GetByte());
@@ -41,6 +49,14 @@ namespace MetadataExtractor.Formats.Jpeg
directory.Set(JpegDirectory.TagNumberOfComponents, componentCount);
+ const int JpegComponentSize = 1 + 1 + 1;
+
+ if (segment.Span.Length < JpegHeaderSize + componentCount * JpegComponentSize)
+ {
+ directory.AddError("Insufficient bytes for JPEG the requested number of JPEG components.");
+ return directory;
+ }
+
// For each component, there are three bytes of data:
// 1 - Component ID: 1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q
// 2 - Sampling factors: bit 0-3 vertical, 4-7 horizontal
diff --git a/MetadataExtractor/Formats/Png/PngMetadataReader.cs b/MetadataExtractor/Formats/Png/PngMetadataReader.cs
index 21a0742a..40e7d8bf 100644
--- a/MetadataExtractor/Formats/Png/PngMetadataReader.cs
+++ b/MetadataExtractor/Formats/Png/PngMetadataReader.cs
@@ -295,35 +295,52 @@ namespace MetadataExtractor.Formats.Png
}
else if (chunkType == PngChunkType.tIME)
{
- var reader = new BufferReader(bytes, isBigEndian: true);
- var year = reader.GetUInt16();
- var month = reader.GetByte();
- int day = reader.GetByte();
- int hour = reader.GetByte();
- int minute = reader.GetByte();
- int second = reader.GetByte();
var directory = new PngDirectory(PngChunkType.tIME);
- if (DateUtil.IsValidDate(year, month, day) && DateUtil.IsValidTime(hour, minute, second))
+
+ if (bytes.Length < 2 + 1 + 1 + 1 + 1 + 1)
{
- var time = new DateTime(year, month, day, hour, minute, second, DateTimeKind.Unspecified);
- directory.Set(PngDirectory.TagLastModificationTime, time);
+ directory.AddError("Insufficient bytes for PNG tIME chunk.");
}
else
{
- directory.AddError($"PNG tIME data describes an invalid date/time: year={year} month={month} day={day} hour={hour} minute={minute} second={second}");
+ var reader = new BufferReader(bytes, isBigEndian: true);
+ var year = reader.GetUInt16();
+ var month = reader.GetByte();
+ int day = reader.GetByte();
+ int hour = reader.GetByte();
+ int minute = reader.GetByte();
+ int second = reader.GetByte();
+ if (DateUtil.IsValidDate(year, month, day) && DateUtil.IsValidTime(hour, minute, second))
+ {
+ var time = new DateTime(year, month, day, hour, minute, second, DateTimeKind.Unspecified);
+ directory.Set(PngDirectory.TagLastModificationTime, time);
+ }
+ else
+ {
+ directory.AddError($"PNG tIME data describes an invalid date/time: year={year} month={month} day={day} hour={hour} minute={minute} second={second}");
+ }
}
yield return directory;
}
else if (chunkType == PngChunkType.pHYs)
{
- var reader = new BufferReader(bytes, isBigEndian: true);
- var pixelsPerUnitX = reader.GetInt32();
- var pixelsPerUnitY = reader.GetInt32();
- var unitSpecifier = reader.GetSByte();
var directory = new PngDirectory(PngChunkType.pHYs);
- directory.Set(PngDirectory.TagPixelsPerUnitX, pixelsPerUnitX);
- directory.Set(PngDirectory.TagPixelsPerUnitY, pixelsPerUnitY);
- directory.Set(PngDirectory.TagUnitSpecifier, unitSpecifier);
+
+ if (bytes.Length < 4 + 4 + 1)
+ {
+ directory.AddError("Insufficient bytes for PNG pHYs chunk.");
+ }
+ else
+ {
+ var reader = new BufferReader(bytes, isBigEndian: true);
+ var pixelsPerUnitX = reader.GetInt32();
+ var pixelsPerUnitY = reader.GetInt32();
+ var unitSpecifier = reader.GetSByte();
+ directory.Set(PngDirectory.TagPixelsPerUnitX, pixelsPerUnitX);
+ directory.Set(PngDirectory.TagPixelsPerUnitY, pixelsPerUnitY);
+ directory.Set(PngDirectory.TagUnitSpecifier, unitSpecifier);
+ }
+
yield return directory;
}
else if (chunkType.Equals(PngChunkType.sBIT))
That said, there are other usages of the reader that aren't validating byte length, so maybe a debug assertion is not enough.
Also, there are a few diffs created by this PR that need to be investigated before we can merge this.
The error message diffs would be ok. We could add the same error messages to the Java library too. But the tag value diff is weird and needs fixing.
Thanks again! Great stuff here. I pushed a few changes as I worked with the regression test suite to preserve output parity.
This PR introduces a ref struct based BufferReader that can operate directly over a span -- and applies it to eliminate a bunch of random allocations.
This same approach can also be used to optimize various Extract functions with a public API change.