Open rdvdijk opened 3 years ago
I think this should be handled using the long fabled MetadataContext. Getting this implemented would also open other possibilities, like an updated take on #225.
Even though I've said it before in other places, I'd like to say once again that setting the default Locale
isn't a realistic option in most cases. The default Locale
is derived from the OS settings and will usually be what you need when presenting numbers, dates, currency and more to the user, and it is also used for case-conversion and setting the UI language. It can't be modified for any application with a user interface that needs this to be set correctly for both formatting and localization.
I edited the comment because I mixed this up with the similar issue of relying on default character encoding, which also causes issues.
Generally, if presenting the information directly to the user, using the default Locale
can be what you want. But, as soon as you intend to parse the output further, or that the default Locale
is involved in parsing of metadata, it will come down to "luck". I don't know if it is, but remember that the default Locale
is used in many places, like every call to String.format()
, Sting.toLowerCase()
and String.toUpperCase()
, not to mention NumberFormatter
, DecimalFormatter
etc.
Thanks for your thoughts, and giving some more context, much appreciated.
I had actually considered creating a Settings
class that would contain the Locale
, instead of just passing the Locale
directly. I decided against that to keep this pull request "to the point". However, such a Settings
class is of course very similar to the MetadataContext
that was discussed in #225. As was discussed in that pull request, I also made the Locale
optional (@Nullable
), not to break the current API.
I agree with you that the Locale
should be applied in many, many more places throughout the code base. We would have to track down every String
operation and Formatter
, and pass the Locale
to each of those.
What could be a way forward? I am thinking it would be relatively easy to introduce a small MetadataContext
interface that supplies the Locale
, but nothing more for now. E.g.:
interface MetadataContext {
Locale getLocale();
}
Future concerns like filtering (#225) could be added to the interface in a separate issue.
However, actually applying the Locale
in all places would turn into a huge pull request. Maybe create separate issues to apply the Locale
for the separate file types?
@rdvdijk I think your plan sounds like a good one. I don't know how much it matters how "big" the PR is when measured in code changes or lines, as I see it, making "one logical change"/feature is as small as it can be. However, it doesn't really matter that much what I think, @drewnoakes is the one that will have to make the decisions.
I don't know if it's preferable to make the "context class", the overloaded API calls and a partial application of Locale
or to make it all in one go. The important thing is that it all gets there in the end. The risk is of course, that if you make a PR now with only partial implementation of Locale
, "the rest" will never follow.
As I remember it, @drewnoakes wanted to minimize the number of places where such an object would have to be passed on. This is where I didn't see clearly what would be the best solution, as storing it as any kind of "global" introduces potential concurrency traps very fast. I guess that if it could be attached to a "reader" that is used for only a single extraction for example, it would be safe though. I think agreeing on exactly how the "context object" should be made accessible all the places it's needed is essential before a lot of work is invested.
This is very interesting. Thanks for picking this up.
I agree with @Nadahar that, given you're going to the effort of threading this through, it'd be worth adding a broader type into which other kinds of state/settings/etc can be added.
I'm not worried about a large PR, so long as the change is clear and consistent and doesn't change the world in some fundamental way. Your idea of breaking it down by file type, for example, makes sense too.
I'll introduce the MetadataContext
interface in this branch, and create a separate branch to apply it in the JPEG code. That might give us a rough indication on how much work it will be to apply through the entire code base.
@Nadahar wrote above:
As I remember it, @drewnoakes wanted to minimize the number of places where such an object would have to be passed on.
Related to that, I have a concrete implementation question: Shall I add MetadataContext
to the TagDescriptor
class? Or would the Locale
be sufficient there?
Option 1: MetadataContext
public class TagDescriptor<T extends Directory>
{
@NotNull
protected final T _directory;
@NotNull
protected final MetadataContext _context;
public TagDescriptor(@NotNull T directory, @NotNull MetadataContext context)
{
_directory = directory;
_context = context;
}
Option 2: Locale
public class TagDescriptor<T extends Directory>
{
@NotNull
protected final T _directory;
@NotNull
protected final Locale _locale;
public TagDescriptor(@NotNull T directory, @NotNull Locale locale)
{
_directory = directory;
_locale = locale;
}
I am leaning towards Option 2 in this case, but I'm not sure if there is a need for other state/settings/etc here.
I'll introduce the MetadataContext interface
Please make it a class. It's easier to version a class over time than an interface. If we add to an interface, it breaks all existing implementations. I don't think this would be an extension point for external code either.
and create a separate branch to apply it in the JPEG code
I'm happy for you to add it in the same branch. It'll be easier to see how it works that way, and change it before it is merged if it needs to.
Shall I add MetadataContext to the TagDescriptor class? Or would the Locale be sufficient there?
I'd suggest adding the whole context. There may be other bits on it we want in future.
Thanks for the swift reply. I'll work on this and report back when I think the branch is at a point to discuss some more details.
Did some work on this. My approach was to start at JpegMetadataReader
, and work my way down to all the Reader
s that were called from there. I passed down the MetadataContext
to those Reader
s, and in turn to their Directory
and Descriptor
classes. I've tried to find all of the DecimalFormat
, String#format
and String#toLowerCase/toUppercase
calls, and configure those using the locale in the new MetadataContext
class.
It's not all done yet, but it's a start. Here is the progress as far as I can tell:
@drewnoakes Can you see if this is going in the right direction?
Things of note:
Reader
s now have (even more) overloaded extract
/readMetadata
methods, to keep backwards compatibility, but allow the passing of the MetadataContext
. You'll find comments there that say TODO document this default context?
, I'm interested in your view on that.TagDescriptor
without a MetadataContext
, so plenty of work to be done still@Nadahar Also curious what you think of the changes so far.
This is quite a lot to look through. Once things I've noticed already is the chosen approach to preserve the existing API. This is only my opinion, and others might not share it, but I would have done it slightly differently.
Instead of requiring the MetadataContext
instance in method calls, I would make it @Nullable
everywhere, and make the code that use it be prepared that it might be null
instead. I would then make sure that as long as it is null
no behavior would change at all, the existing code should run as it is when that's the case. I would make overloads of methods (similar to what you've done) where all the existing methods would simply call the corresponding new method with null
for MetadataContext
.
I don't know if it makes much practical difference, but I prefer not to create objects when they're not needed, and I think using null
as "preserve current behavior" would make it easy to make sure that nothing actually changes for those that use it without using MetadataContext
.
Here's an example to illustrate what I mean, here is the "pre context version":
public SomeType foo(SomeOtherType bar) {
...
}
This is the corresponding "post context version":
public SomeType foo(SomeOtherType bar) {
return foo(bar, null);
}
public SomeType foo(SomeOtherType bar, @Nullable MetadataContext context) {
...
}
Don't make any changes based on my opinion alone though.
@Nadahar Thanks for the notes, much appreciated.
Instead of requiring the
MetadataContext
instance in method calls, I would make it@Nullable
everywhere, and make the code that use it be prepared that it might benull
instead.
I personally disagree with that. I really dislike passing along null
everywhere, and would rather avoid that at all costs. (The inventor of "null", Tony Hoare, even called it his "billion-dollar mistake" :wink: )
Allowing null
would mean that we'd have to add a null
-check in many places throughout the code, and make decisions on what to do when we do not have a context. Having a "sensible default context" made more sense to me.
but I prefer not to create objects when they're not needed
My approach was to create one MetadataContext
at the most, just as there is one Metadata
instance being passed around. The overhead of such an object instance would be minimal.
@rdvdijk I hope this PR will eventually reach maturity and get merged. Is there any blocker?
@rdvdijk I hope this PR will eventually reach maturity and get merged. Is there any blocker?
I do not think there are any blockers, but discussion here kind of just stalled two years ago.
What do you think, @Nadahar and @drewnoakes ? Should I spend some time to bring this PR back to life?
Ping @Nadahar and @drewnoakes .
Today we ran into something similar: a difference in system timezone caused unexpected timestamps. This is something that could also be fixed by the changes in this branch, or a subsequent pull request that adds configurable TimeZone
support.
I would like to make progress with this PR, but I really need some more feedback if this PR is heading in the right direction. I'd be happy to take another approach if that is preferred.
As initially discussed in https://github.com/drewnoakes/metadata-extractor-images/issues/35, the current formatting of certain tag descriptions are locale-dependent. The only way to influence the formatting is to set the locale globally, using
Locale#setLocale
. This can have undesirable effects when integratingmetadata-extractor
in a larger application.This pull request explores the idea of making the
Locale
configurable. A specificLocale
is passed toJpegMetadataReader#readMetadata
, and all the way down to theDescriptor
s that need it to format specific values. Two descriptors have been altered to show two specific examples:ExifDescriptorBase
: to render the F-number (f/6.0
vsf/6,0
)GpsDescriptor
: to render the longitude and latitude (54° 59' 22.8"
vs54° 59' 22,8"
)See the two added tests in
JpegMetadataReaderTest
(testConfigurableLocaleEnglish
andtestConfigurableLocaleDutch
).There is still much left to be done:
TODO
s with further ideas, I did not want to make the pull request even largerI am curious what you think of this, any feedback is welcome.