drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.55k stars 479 forks source link

Unnecessary FilterInputStream requirement in FileTypeDetector. #512

Closed garretwilson closed 3 years ago

garretwilson commented 3 years ago

I'm starting to try out your Metadata Extractor library. The first thing I want to do is detect the file type using FileTypeDetector:

public static FileType detectFileType(@NotNull final FilterInputStream inputStream) throws IOException

But why must I pass it a FilterInputStream? The standard approach is just to pass a plain old InputStream.

A FilterInputStream is simply a tool for creating a new input stream by "wrapping" an existing stream. (This is also referred to as the Decorator Design Pattern.) But that's an implementation detail. Why should detectFileType() care if I've wrapped an input stream or not?

Now if I have a perfectly good InputStream, in order to call detectFileType() I have to wrap it in a FilterInputStream, which serves no purpose because it's just a wrapper. You can look at the source code for FilterInputStream. It doesn't do anything more than the underlying InputStream does. Your detectFileType() gains absolutely no benefit—zero; no added functionality, and no guarantees—by requiring a FilterInputStream.

If we look at the documentation for detectFileType(), it says this:

    /**
     * …
     * Requires a {@link FilterInputStream} in order to mark and reset the stream to the position
     * at which it was provided to this method once completed.
     * …
     */

But this is not true. As we've seen FilterInputStream has nothing to do with whether mark and reset are supported; in fact whether a FilterInputStream supports mark/reset depends on the wrapped input stream.

The mark(int readlimit) and reset() methods are defined on the plain old InputStream class, so you could have (and should have) allowed any InputStream. But the only way to know whether the an input stream actually supports mark/reset is to call InputStream.markSupported() and see whether it returns true.

Your code does this check correctly. So why does it artificially require a completely irrelevant FilterInputStream requiring the caller to needlessly wrap a perfectly good InputStream?

garretwilson commented 3 years ago

I think what you maybe did is looked at the types of input streams Java already provides, and noticed coincidentally that some of them that extend from FilterInputStream (e.g. BuferedInputStream) happen to implement mark/reset as well. But that's just a coincidental implementation detail. This approach bypasses the semantics of the API:

So this approach 1) doesn't even guarantee what it attempts to guarantee, and 2) prevents the direct use of completely valid other types.

kwhopper commented 3 years ago

If you're unaware of this library's development history -- I believe @drewnoakes inherited this project from other developers. It's entirely possible many of these things were implemented years and years ago. Usually, it's dangerous to make fundamental changes which could effect many who have used the library for a long time as it is - unless those changes are curated into some new release and tested thoroughly.

A pull request that implements your thoughts on this matter would be helpful and could allow others to test it on the versions of Java supported by the library. Thanks

Nadahar commented 3 years ago

In this case I don't think it could have any adverse effects to do as @garretwilson suggests. I think the assumption that whoever picked FilterInputStream did it because they thought it guaranteed mark support is very likely - and this is a safeguard that has never actually been working. In addition it excludes InputStream implementations that would work perfectly well.

The check for mark support is even implemented in addition, so it could be replaced with InputStream without breaking any existing code:

https://github.com/drewnoakes/metadata-extractor/blob/bb78cb01ba32065e9bfc5a51d91df1d3f981459a/Source/com/drew/imaging/FileTypeDetector.java#L129-L133

drewnoakes commented 3 years ago

Widening the type to InputStream is reasonable.

It's a binary break, not a source break. I'm usually reluctant to make any kind of break, especially to a commonly used API.

In this case I believe it would acceptable.


@kwhopper actually the history is slightly different. I wrote the original version of the library in Java in 2001. A few years back some folks converted it to C#, and I started maintaining that .NET port of the library as well. Nowadays we try to keep the two in sync as much as possible.

kwhopper commented 3 years ago

@drewnoakes Heh - yeah, I went by bits and pieces on the history. Sorry

drewnoakes commented 3 years ago

No worries at all. I can't put the blame for any bad design choices on anyone else 🙂