datascience / c3po

Clever, Crafty Content Profiling of Objects
http://ifs.tuwien.ac.at/imp/c3po
Apache License 2.0
4 stars 3 forks source link

Strange conflicts #44

Closed stiefel40k closed 8 years ago

stiefel40k commented 8 years ago

Hi,

In some cases I can't understand why c3po can't identify the format or the mimetype correctly. For example I have the following output for an object:

format : Plain text [Jhove:1.11], Plain text [file utility:5.20], mimetype : text/plain [Jhove:1.11], text/x-makefile [file utility:5.20],

where both the mimetype AND the format is marked red, meaning it is conflicted. I mean for the mimetype it is clear that it is a conflict, but why the format? I mean both tools have the same output, why is it than a conflict?

I also had the opposit where Droid identified a file as Hypertext Markup Language and Jhove as HTML Transitional for the format. Ok it is a clear conflict, but the same tools gave the same result for the mimetype (text/html) even so was the mimetype marked as conflict.

I mean am I stupid, and do I miss something, or is it a problem of c3po? Or are these two attributes calculated in a way where they depend on each other? I mean if one is conflicted so is the other.

artourkin commented 8 years ago

Hi, thank you for the question. As you might know, c3po processes the FITS outputs. For format identification, FITS binds together the outputs from 1 tool source. In case there is a conflict in any of the format identification properties (format, format_version or mimetype), the whole triplet will be marked as CONFLICT. In c3po, we want to keep this information (which might be useful for conflict resolution) and mark the properties as CONFLICTed.