Closed axxel closed 7 years ago
Well, it turns out: this could be improved even further: simply remove the CFA tag for most of the cameras?!?
I just synced my raws to start a new PR cycle and not miss a few files again. I noticed that a couple Panasonic files failed with the error Unexpected CFA color GREEN2
and I was thinking: ups, my suggestion from above does not allow to specify a second green channel. Wondering why there is no single CFA in the xml mentioning this color I found the origin in the source code. A few decodeMetaDataInternal
functions start with this line: setCFA(iPoint2D(2,2), CFA_RED, CFA_GREEN, CFA_GREEN2, CFA_BLUE);
So they basically set a 'default' value for the specified file type. Fortunately the RW2 decoder later fails if this GREEN2
shows up, even though is was default initialized explicitly to this very value (seems a lot like a bug to me). So every Panasonic camera without an explicit CFA tag automatically fails.
This led to two observations:
GREEN2
in the code to GREEN
, those raws can successfully be decoded nowRGGB
and quite a few already use this default feature by not specifying any CFA info.Amended suggestion to go forward:
This led to two observations:
- if I replace this GREEN2 in the code to GREEN, those raws can successfully be decoded now
Amended suggestion to go forward:
- check the defaults in the source and make sure they are actually meaningful for the respective brand
Yes.
I suspect that it is a copy-paste issue, it was added in one place, and then spread to other places, without much thinking.
And since darktable and all the software that uses our code sets d->failOnUnknown = true;
, it was never really encountered.
Amended suggestion to go forward:
- remove the GREEN2 constant from the code base
I fail to find anything about non-RGB CFA's in dng spec. I think it may have been an attempt to have unique color names for all the 4 CFA colors. But it is clearly not happening in X-Trans case. More details are needed.
This led to two observations:
- we don't need to specify the CFA tag at all if the camera uses the 'default' pattern for this particular file type/make. I checked for Canon and indeed, most of the models use
RGGB
and quite a few already use this default feature by not specifying any CFA info. Amended suggestion to go forward:- explicitly specify the default pattern for each brand in the xml as documentation and suggest to not introduce the CFA tag if the particular model uses the default
Time for some gory detail: this is how the majority of these entries are created: darktable/tools/dngmeta.sh
I believe it is best to keep the <CFA>
tag.
More details are needed.
Which details and where would they come from? Your thinking suggest to me there has been an attempt at fixing an imaginary problem no one has, thereby introducing real problems.
Time for some gory detail: this is how the majority of these entries are created: darktable/tools/dngmeta.sh I believe it is best to keep the
tag.
Would you please explain to me how the presence of this shell script makes you come to your conclusion that it is bet to keep it? Please note: I did not suggest to completely remove the CFA tag but to improve it. And where it is not necessary it is an improvement to not specify it.
In fact: the presence of this shell script or better yet the presence of the CFA info inside the raw makes it even more obvious that probably most CFA-info from the xml should disappear: for every model that stores the info in the file, this is exactly where it should be taken from. I can assume the decision to provide it in the xml comes from a time before the white balance info was parsed inside RS and has never been questioned since.
I assume it is non controversial that in an ideal world the whole presence of the xml-file would be unnecessary because every info you need to successfully develop a raw comes together with the raw. Agreed? So every info we need and can extract from the raw we should get from the raw. Agreed?
Which details and where would they come from?
Dunno yet. Reading git log
, reading more specs.
Your thinking suggest to me there has been an attempt at fixing an imaginary problem no one has, thereby introducing real problems.
Yes, unless there is an actual GREEN2
CFA color out there in the world, and in these cases we have it is just erroneously used.
Would you please explain to me how the presence of this shell script makes you come to your conclusion that it is bet to keep it? Please note: I did not suggest to completely remove the CFA tag but to improve it. And where it is not necessary it is an improvement to not specify it. In fact: the presence of this shell script or better yet the presence of the CFA info inside the raw makes it even more obvious that probably most CFA-info from the xml should disappear: for every model that stores the info in the file, this is exactly where it should be taken from.
Did you notice that script is run on DNG, not the raw files? Also, did you read https://github.com/darktable-org/rawspeed/tree/develop/RawSpeed#updating-camera-support ? If you drop <CFA>
tags, you will knowingly fuck that up. Because earlier versions of rawspeed need them.
I can assume the decision to provide it in the xml comes from a time before the white balance info was parsed inside RS and has never been questioned since.
I assume it is non controversial that in an ideal world the whole presence of the xml-file would be unnecessary because every info you need to successfully develop a raw comes together with the raw. Agreed?
No. You need <ID>
, <crop>
, <alias>
, probably <hints>
. Rest maybe is within exif/makernotes.
So every info we need and can extract from the raw we should get from the raw. Agreed?
Yes probably. Especially black/white levels.
Yes, unless there is an actual GREEN2 CFA color out there in the world, and in these cases we have it is just erroneously used.
Well, there is one fact that should help in determining the relevance of this potentially important extra green: the existing practice, you just mentioned (all the software that uses our code sets d->failOnUnknown = true;
), plus the fact that no CFA tag in the xml mentions this extra green mean there is currently no software dealing with/depending on the GREEN2
.
The only place where GREEN2
is 'used' inside the RS code base is in the dcraw filter conversion functions. Looking at what dcraw has to say about this, it seems the idea was to allow the interpolation of the two green channels as separate colors. Quote from dcraw website:
What does the "-f" (four color RGB) option do? If you see patterns like this or this in your output images, first try "dcraw -a". If these patterns persist, use "dcraw -f" to get rid of them.
So it has nothing to do with an actual second type of green but with an alternative on how to interpolate the CFA data. This, however, is not the concern of RS.
Also I had a look at all the CFA info that can be found inside our raw sample set:
exiv2 -pv *.??? 2>/dev/null | grep CFAPat | tr -s ' ' | cut -d' ' -f 7- | sort -u
. Here is the output:
0 1 1 2
0 2 0 2 0 1 1 2
0 2 0 2 1 0 2 1
0 2 0 2 1 2 0 1
0 2 0 2 2 1 1 0
-> 0 2 0 2 5 3 1 4
1 0 2 1
1 2 0 1
2 0 2 0 0 1 1 2
2 0 2 0 1 0 2 1
2 0 2 0 1 2 0 1
2 0 2 0 2 1 1 0
2 1 1 0
-> 5 3 1 4
Only the two lines I marked with ->
contain anything other than 0/1/2. And the only two files that have those are RAW_NIKON_E5400.NEF and RAW_NIKON_E5700_SRGB.NEF. The second one is mentioned in the xml and the colors are FUJI_GREEN, MAGENTA, YELLOW, CYAN
. By the way: FUJI_GREEN
is another green.
Did you notice that script is run on DNG, not the raw files?
I noticed the variable name hinted at that. But why do you mention it? What relevance has this fact?
If you drop
tags, you will knowingly fuck that up. Because earlier versions of rawspeed need them.
This is exactly why I think it is better to have as little information in there as possible and why I am convinced that in an ideal world we would not need this xml file. Then all the problems related to updating/maintaining this file would be gone. Regarding the upgrade path: there is the decoder version mechanism that could be used to make sure nothing breaks if keeping this kind of forward compatibility was really that important. Who is really dependent on old rawspeed installments but then actively copies new xml-files out of some source repository? And to get individual new cameras working, the interested user can still manually patch his xml according to the doc. I don't think this use case should generally prohibit any improvements to the xml data structure.
No. You need ID, crop, alias, probably hints. Rest maybe is within exif/makernotes.
I was talking about the ideal situation, not the one we have right now. To address your mentioned tags:
<ID>
: needed only to identify all the rest -> required<crop>
: I bet most of the formats have this information but they typically crop more than required. So this is definitively -> useful. Still: one could use the exif-info as a fallback if no crop was given. This way, this missing info is not flat out prohibiting the use of RS for yet unknown cameras.alias
: I have no knowledge about the reason why this is therehints
: obviously required where it is required. could be that a few of them can be determined from exif data. but generally -> required.Well, there is one fact that should help in determining the relevance of this potentially important extra green: the existing practice, you just mentioned (
all the software that uses our code sets d->failOnUnknown = true;
), plus the fact that no CFA tag in the xml mentions this extra green mean there is currently no software dealing with/depending on theGREEN2
.
Yes, i agree. Well, not unless some software has their changes to the cameras.xml
, which make use of the GREEN2
.
The only place where
GREEN2
is 'used' inside the RS code base is in the dcraw filter conversion functions. Looking at what dcraw has to say about this, it seems the idea was to allow the interpolation of the two green channels as separate colors. ... This, however, is not the concern of RS.
Thanks for looking in dcraw, my eyes start bleeding each time i read that.
Yes, i do think that GREEN2
is bogus, and most likely needs to go away. https://github.com/darktable-org/rawspeed/issues/34
I noticed the variable name hinted at that. But why do you mention it? What relevance has this fact?
It shows that the cameras.xml
is not created with the data that was simply extracted from the raw file, but from the DNG that was created from the raw. I'm not saying that most data can't be found in the non-DNG raws.
This is exactly why I think it is better to have as little information in there as possible and why I am convinced that in an ideal world we would not need this xml file.
In ideal world there would only be DNG, but we are not in an ideal world :/
Please do note that not all the info cameras.xml
contains is simply extracted from the raw.
<ID>
is not extracted from the raw. <crop>
must be manually computed, and is not equal to the crop specified in the raw file. <alias>
is created manually only.
There are even thoughts about putting adobe color matrixes in there. Surely, they can't be extracted from the raw :)
Regarding the upgrade path: there is the decoder version mechanism that could be used to make sure nothing breaks if keeping this kind of forward compatibility was really that important.
Hmm. One day it is working, and images from camera X are loading just fine. Then the cameras.xml
is updated, and they no longer load. I'd call that a breakage.
Who is really dependent on old rawspeed installments but then actively copies new xml-files out of some source repository?
I'm not sure anyone can answer that question.
And to get individual new cameras working, the interested user can still manually patch his xml according to the doc. I don't think this use case should generally prohibit any improvements to the xml data structure.
I'm not sure i'm parsing this all right. You are not talking about dropping failOnUnknown
option?
In ideal world there would only be DNG, but we are not in an ideal world :/
Of course, I was just trying to make the point that this is the 'right' direction going forward.
<ID>
is not extracted from the raw.<crop>
must be manually computed, and is not equal to the crop specified in the raw file.
Agreed (see my '-> required' and '-> useful' conclusion above).
<alias>
is created manually only.
Agreed (I only said, I have no idea what this is good for. I did not mean to say "this has no use".)
I'm not sure anyone can answer that question.
Well, those users could. And if there are any listening: please speak up! :-).
But it is not like 'we' as the developers putting our free time into improving stuff like RS owe someone who might be using this 'corner case feature' of some kind of forward compatibility that this will work forever. This whole discussion about removing unnecessary information from the xml will hopefully result in substantially improved forward compatibility (from now on), to the point where someone might be able to more or less usefully decode a file even we've never seen before.
I'm not sure i'm parsing this all right. You are not talking about dropping failOnUnknown option?
No, but I think this could potentially be changed to something like "warnOnUnknown". Let's assume a new Canon model shows up and they do their usual stuff: change the model name, the sensor size, the color matrix, maybe some iso-level dependent black/white levels and that is it (no new fancy encoding 'features' for now). Apart from the color matrix (which technically is not really a RS concern), I don't see why we could not get all that info, at least as some 'useful defaults or best guesses', from the CR2 file.
If a user of dt could then decide to be warned about unknown files but still get something useful out without needing to upgrade, would that not be an improvement? Maybe he could have a few more pixels out of his raw if the crop was set better or the color matrix would improve his results but I bet, if he'd be informed about the potential issues, he might still be really happy to be able to get at least something.
As for the color matrices: accessing them through the RS interface would make sens to me. As a user, I'd simply like to get everything I need to successfully decode my image from this one place. In my 'ideal' world those matrices would be stored inside the raws, of course ;).
But since this would be the only info that is really missing for this hypothetical new Canon: maybe dt could add a feature to automagically download those few numbers from somewhere 'on the internet'. No idea where there are from but I guess Adobe must make them public somewhere, otherwise they would not show up in dcraw.c? But that is obviously way off topic here.
No, but I think this could potentially be changed to something like "warnOnUnknown".
Yeah, no. darktable switched to rawspeed and dropped libraw because it does not produce useless garbage by default. This won't change.
Also, what a TL;DR
Yeah, no. darktable switched to rawspeed and dropped libraw because it does not produce useless garbage by default. This won't change.
Very well. It was just a suggestion.
But to come to some conclusion/agreement on how to go forward. Do you agree with the following?
This change will not allow old RS code to use the new xml file.
If you have objections, please suggest an alternative.
And now i'm starting to understand all the responses "this isn't broken, why change it" i am/was getting.
BTW, forgot one more extremely important reason why <camera>
and <id>
tags will stay as is - they are used for automatic generation of changelog of camera support between dt versions.
This change will not allow old RS code to use the new xml file.
I'm not ready to make such decisions. So i don't agree.
And now i'm starting to understand all the responses "this isn't broken, why change it" i am/was getting.
Certainly not because of me. I had nothing to do with what did or did not work until about a week ago :).
BTW, forgot one more extremely important reason why
and tags will stay as is - they are used for automatic generation of changelog of camera support between dt versions.
That is good but please note that I never suggested to remove the camera/id tags from the file.
This change will not allow old RS code to use the new xml file.
I'm not ready to make such decisions. So i don't agree.
Ok. So your alternative is to not change anything?
I have another alternative: the xml file stays just as it is but we still try to extract CFA/black/whitelevel from the raw and use that, just in case it is not present in the xml-file. So users (like me) who don't care for the failOnUnsupported flag get an improvement, while old users still can use the xml file as ever.
Entries in xml should always overwrite values read from raw files. I had seen minimum one camera where the cfa pattern was wrong in dng and I it could have been wrong in raw (if it was included). I don't remember which camera it was, so can not test it.
In the long term we should not add CFA in xml anymore if not needed, but we should give developers some time to adjust there software. We should announce it in the next release with a timetable.
CFA data removing should only be done with checks for every camera, I'd go the safe way and test it.
@axxel
... we still try to extract CFA/black/whitelevel from the raw and use that
We are going in circles. I have already answered that question a day ago, https://github.com/darktable-org/rawspeed/issues/26#issuecomment-272273937
Ok. So your alternative is to not change anything?
In cameras.xml
, in near future - yes.
@schenlap
Entries in xml should always overwrite values read from raw files.
No questions here, +1.
I had seen minimum one camera where the cfa pattern was wrong in dng and I it could have been wrong in raw (if it was included).
Fun... i wonder if it is related to #30 Thought it is not unheard of, see darktable/tools/dngmeta.sh
We should announce it in the next release with a timetable.
That's a tricky point, there isn't any rawspeed releases... Once dt-2.4 gets released, i will start branching rawspeed; but as for actual releases i'm not quite sure.
We are going in circles. I have already answered that question a day ago, #26 (comment)
Alright. I was just trying to summarize the made decisions we agreed on to prevent misunderstandings and explicitly asked you to provide constructive feedback if there is something in the summary you do not agree with. I listed 3 decisions/suggestions and all I got as feedback was So I don't agree.
.
Regarding my earlier claim that all EOS use the same RGGB pattern: that was founded on my own experience with a couple of EOS models and this statement: http://lclevy.free.fr/cr2/#interpol. It turns out that my claim was false. While most of the EOS models use RGGB, the following don't according to the xml: 50D, 60D, 550D, 600D, 5DM2, 1200D, 1300D, 1D, 1Ds, 1DsM2
. And I verified it for two of them. And since the CR2 code also mentions the GREEN2
as the default, we can not remove the CFA info from the other EOS models and keep the forward compatibility of old code.
I had seen minimum one camera where the cfa pattern was wrong in dng and I it could have been wrong in raw (if it was included).
Maybe it was this one?
RAW_NIKON_E5400.NEF 0x828e SubImage2 CFAPattern Byte 4 2 1 1 0
RAW_NIKON_E5400.NEF 0xa302 Photo CFAPattern Undefined 8 0 2 0 2 5 3 1 4
So you can see, it has two contradicting CFAPattern infos. Maybe the script picked the wrong one? FYI: I checked all the samples we have and only NEF files have two CFAPattern tags and only this single one has contradicting information.
Once dt-2.4 gets released, i will start branching rawspeed; but as for actual releases i'm not quite sure.
What is your reasoning here? Something like "stable" vs. "development" kind of branching?
Alright. I was just trying to summarize the made decisions we agreed on to prevent misunderstandings and explicitly asked you to provide constructive feedback if there is something in the summary you do not agree with. I listed 3 decisions/suggestions and all I got as feedback was So I don't agree..
Ah, sorry, i was quoting the part of the message my replies was about.
To summarize:
GREEN2
in most cases is probably wrong, and needs to be changed to GREEN
.cameras.xml
will always override the data from the raw.<CFA*>
entries:
<CFA*>
tags.That is the bare minimum.
Maybe it was this one?
> RAW_NIKON_E5400.NEF 0x828e SubImage2 CFAPattern Byte 4 2 1 1 0 > RAW_NIKON_E5400.NEF 0xa302 Photo CFAPattern Undefined 8 0 2 0 2 5 3 1 4
No, i'm pretty sure that was not it...
For RAW_NIKON_E5700_SRGB.NEF
, which is real CYMG
it says:
$ exiv2 -Pkt raw-camera-samples/raw.pixls.us/Nikon/E5700/RAW_NIKON_E5700_SRGB.NEF | grep -i cfa
raw-camera-samples/raw.pixls.us/Nikon/E5700/RAW_NIKON_E5700_SRGB.NEF: (No XMP data found in the file)
Exif.SubImage1.CFARepeatPatternDim 2 2
Exif.SubImage1.CFAPattern 5 3 1 4
Exif.SubImage2.CFARepeatPatternDim 2 2
Exif.SubImage2.CFAPattern 5 3 1 4
Exif.Photo.CFAPattern 0 2 0 2 5 3 1 4
And that ^ has the clue to the probably only reason GREEN2
exists.
BTW, for script, you'd need to convert them to DNG (adobe dng converter), and look for Exif.SubImage1.CFAPattern
in it's EXIF. And the script only looks at Exif.SubImage1.CFAPattern*
, not for any other sub-image.
What is your reasoning here? Something like "stable" vs. "development" kind of branching?
Yes. for dt git master, we can use rawspeed git develop/master branch, however it will be called.
But for stable releases, it is inappropriate. So rawspeed will be branched, and only the small fixes, mostly only the cameras.xml
changes, will be cherry-picked, and darktable stable branch will track that rawspeed stable branch.
[...] That is the bare minimum.
Works for me. As kind of a final remark: I personally don't care about this xml/CFA situation as I do not have to deal with it. I just thought it was messier than necessary and changing it might make other people's life easier (not harder!) as in not having to deal with it at all :).
And that ^ has the clue to the probably only reason GREEN2 exists.
You mean someone did not realize there already is a second green and introduced just another one?. I quote myself from somewhere above where I spoke about this very image:
By the way:
FUJI_GREEN
is another green.
This CMYK file says it has FUJI_GREEN. So we are still at 0 images that have anything to do with GREEN2.
You mean someone did not realize there already is a second green and introduced just another one?. I quote myself from somewhere above where I spoke about this very image: By the way: FUJI_GREEN is another green. This CMYK file says it has FUJI_GREEN. So we are still at 0 images that have anything to do with GREEN2.
No-no-no. I'm talking about a different thing. Let's see if i can not fail to convey this thought...
rawspeed supports 2 major type of sensors: Bayer and X-Trans.
(Let's forget that X-Trans exists for the rest of the comment.)
Now, there exists this macro - FC()
, which is used to compute what is the current color of Bayer mosaic pixel given dcraw-specific filter pattern (e.g. filters: -1802201964 (0x94949494)
), and a pixel x/y position.
In case of simple 3-colored RGB mosaic, some 2 cells share a color, as we see in CFAPattern
exif tag, e.g. 0 1 1 2
. And FC()
simply returns 0
for RED cell, 1
for both GREEN cells, and 2
for BLUE cell. And FC()
's output is simply used as an index for float wb_multiplier[3]
array, where wb_multiplier[0]
is the wb_multiplier for RED channel, wb_multiplier[1]
is the wb_multiplier for GREEN channels and wb_multiplier[2]
is the wb_multiplier for BLUE channel. Example.
But if there are 4 cells, and all have different color, it will be something like 5 3 1 4
. So do we do? We could forget about FC()
, use float wb_multiplier[6]
. But we know that for all the 2x2 Bayer sensors, there can't be more than 4 different colors in pattern. So we can simply map these colors into 0..3 range. That way, (did not check the details, but the idea is correct) FC()
returns 0
for CYAN cell, 1
for YELLOW cell, 2
for MAGENTA cell and 3
for FUJI_GREEN cell. And then every usage of FC()
simply continues to work, it still outputs 0..3.
So i think, the GREEN2
is there only for dcraw-based filter patten formatting to work, for this remapping of non-RGB colors into 0..3 range. I see no other use-cases.
See ColorFilterArray::toDcrawColor()
and ColorFilterArray::toRawspeedColor()
Thanks for the detailed feedback. I can follow your thought up the last paragraph:
So i think, the GREEN2 is there only for dcraw-based filter patten formatting to work, for this remapping of non-RGB colors into 0..3 range. I see no other use-cases. See ColorFilterArray::toDcrawColor() and ColorFilterArray::toRawspeedColor()
Everything you wrote before that still does not explain or need the presence of the GREEN2
enum in the source code. If you remove the unhelpful default settings who get virtually always overridden by the xml data anyway, there is simply no way for that GREEN2 constant to ever show up in that switch in toDcrawColor
.
More digging and I found this (dcraw) which finally explains to me where all this 'stuff' must have originally come from. (While still not being convinced that this was the best thing to do.)
This also made me wonder what the problem with this camera might be? Maybe it has something to do with this bug? (compare x/y with dcraw code)
Conclusion: Every manufacturer seemed to have agreed to assign standard numbers in their exif CFAPattern field as follows: 0=R, 1=G, 2=B
. Nikon/Canon needed more numbers for more colors and added 3,4,5 for C,M,Y
(I don't know the correct order). And for their G color, they simply used the already existing '1', makes total sense to me. What follows is that both GREEN2
and FUJI_GREEN
are just there to confuse, all meaning the same physical thing. Lets assume we have those colors parsed from xml or figured out from the exif data, does not matter. Then there is obviously a need to get the dcraw-magic-int from them so the helpful FC macro works in dt. This is a completely separate problem (an implementation detail) and should be considered separately without obfuscating/complicating the xml format or string naming. With both extra greens gone, things start to make sens for me. You said you don't want to change a thing with the xml content - all good. This is just another reason to do it at some point in time.
I have a few question for you as a user of the ColorFilterArray
class from a dt perspective:
getDcrawFilter
from this class? If so, please specify and what you use it for.@axxel
Everything you wrote before that still does not explain or need the presence of the GREEN2 enum in the source code. If you remove the unhelpful default settings who get virtually always overridden by the xml data anyway, there is simply no way for that GREEN2 constant to ever show up in that switch in toDcrawColor.
Okay then, feel free to s/GREEN2
/GREEN
/ and please do show me that all the tests still pass.
I have a few question for you as a user of the ColorFilterArray class from a dt perspective:
- do you need any constructor/method/member other than getDcrawFilter from this class? If so, please specify and what you use it for.
darktable/src/common/imageio_rawspeed.cc, dt_rawspeed_crop_dcraw_filters()
- would you care for those PowerShots mentioned in the above dcraw link to work?
One, 20-year-old camera. I'm not so sure i do care.
Okay then, feel free to s/GREEN2/GREEN/ and please do show me that all the tests still pass.
Will do.
darktable/src/common/imageio_rawspeed.cc, dt_rawspeed_crop_dcraw_filters()
Perfect. So what is really needed are 3 things:
One, 20-year-old camera. I'm not so sure i do care.
Even better. I am all for removing old/useless cruft. This means parts of the ColorFieldArray
API can go. They are not used and have never worked.
I'll come up with a PR that introduces the described simplified CFA specification as an alternative to the current one (not changing the xml content). Allowing to change the xml on a later date or on a case by case basis or never at all.
I am really biased against all these cleanups.
The fact that
CFA2
was introduced (probably because someone noticed that this verbosex, y, color
interface is inappropriate to specify a 6x6 array) shows that the CFA tag has an issue. But the solution to come up with a CFA2 like it is, is not a good solution to this problem.My issues with this are:
0x0
, or specifying a 2x3 pattern.Proposal by example:
becomes
Note: one could insert line breaks inside the string if desired to align the different lines below each other
Advantages:
The implementation would simply remove all whitespace from the string, then the number of characters has to be a square (well, actually only 4 and 36 are interesting right now). Then the string is parsed character by character and the internal array is filled. done.