Feedback from @joeroe - Githubissues

@nevrome invited me to take a look at this. I've been professionally overthinking radiocarbon data for https://xronos.ch for a few years, so please excuse the pedantism, I can't help it anymore!

[ ] Calling them radiocarbon 'dates' is a bit of a misnomer, since they are measurements of an elapsed period of time rather than a point on a calendar. You might consider replacing 'date' with 'measurement' or (my preference) 'age' in the variable names and descriptions.
[ ] "The uncalibrated date from the laboratory measurement" – the term of art here is 'conventional radiocarbon age' (CRA)
[ ] c14_date_lab_code – archaeologists are not consistent about this but generally speaking the lab code is the alphanumeric label for the lab (e.g. OxA, Oxford AMS) and the lab identifier is the compound ID produced by combining this with an integer (e.g. OxA-1234). You might also want to specify whether you prefer the two parts to be separated by a hyphen or a space, as both are used in literature.
[ ] c14_date_sd – the uncertainty associated with a date is an estimate of measurement error, not a standard deviation (see Scott et al. 2007). I usually call this something like 'c14_age_error'.
[ ] calib_age_median – since a calibrated radiocarbon distribution is non-parametric, the median is generally not meaningful (Michczyński 2007). The description ("Radiocarbon calibration age upper limit") also appears to be incorrect.
[ ] calib_age_oldest / calib_age_youngest – have you considered that some calibration software (e.g. OxCal) will report multiple ranges for a given threshold?
[ ] calib_age_sigma:
- [ ] sigma implies a confidence interval measured in standard deviations, but it doesn't have to be, and some statisticians warn against this. I'd call it calib_age_range or something.
- [ ] Perhaps its worth (optionally) specifying what method was used to calculate the interval? End users are almost never aware of it, but different calibration software does use different methods, e.g. highest density interval, equal-tailed interval, shortest probability interval. As mentioned above, some will only report a true interval (two numbers), whilst others support a vector of regions.
- [ ] To be really explicit this should be a two element vector. When we say "95.4%" range it's implied that it's the central 95.4%, but it doesn't have to be. Might be overkill though.
[ ] No error value for delta_13_c?
[ ] It's conventional report the sample taxon as well as the sample material
[ ] I agree it would be a good idea to support F14C values as well as CRAs (#4)

FWIW, the difficulty of precisely describing the parameters of a calibrated dates, and the existence of (many) different plausible calibrations of the same underlying date, led us away from incorporating this information alongside c14 dates in XRONOS (see https://github.com/xronos-ch/xronos.rails/issues/46 for some discussion). I tend to see calibrated dates as models, rather than data.

Thank you very much @joeroe ! No need for apologies, this is extremely helpful and I greatly appreciate your time!

I will definitely incorporate many of you suggestions! I need to think a few through first as for this specific context I need to balance 'precision' (so to say) with 'user friendlyness' from the PoV that the people we are primarily designing this for are not specialists, but palaeogenomicists who often will only really care about e.g. the calibrated ages (but I completely get your points), or sometimes need to think about the term names for something most identifiable by them.

For now (still have a month until I'm back full time), I have one outstanding question:

the calibration median was not something I myself had thought of (as my understanding indeed that it is a distribution with 'most likely' date point(s)), but rather a suggestion from @nevrome to add... I just want to check if he had any further motivation behind his suggestion for this?

Hey @joeroe - thanks for taking the time and adding these very valuable comments.

About the calib_age_median: My perception is that few colleagues in archaeogenetics bother to give age information proper treatment via temporal resampling (from age ranges or post-calibration probability distributions). If the data does not include a median, then many may just compute a mid-point between the start and end point of whatever range you'll provide for their modelling application, which is arguably worse than the median. So my reasoning was guided by this conflict between 'accuracy' and 'convenience' you mentioned, @jfy133.

The same applies for the high density regions (HDRs), so what Joe introduces as "multiple ranges for a given threshold". My tool currycarbon computes them as well, both on a 1-sigma and a 2-sigma level (I'll have to rethink the term sigma here, I guess :smile:). I defined the "overall" 2-sigma range of a calibrated age as the start and end of the 2-sigma HDRs.

CalEXPR: [1] 1:3000±30BP
Calibrated: 1379BC >> 1364BC > 1238BC < 1131BC << 1124BC
1-sigma: 1364-1360BC, 1282-1197BC, 1169-1163BC, 1140-1131BC
2-sigma: 1379-1344BC, 1304-1124BC
                              ▁▁▁▁▁▁
                             ▁▒▒▒▒▒▒▁
                            ▁▒▒▒▒▒▒▒▒▁
               ▁          ▁▁▒▒▒▒▒▒▒▒▒▒▁    ▁   ▁▁
              ▁▒▁       ▁▁▒▒▒▒▒▒▒▒▒▒▒▒▒▁▁▁▁▒▁  ▒▒
            ▁▁▒▒▒▁▁    ▁▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▁▁▒▒▁
        ▁▁▁▁▒▒▒▒▒▒▒▁▁▁▁▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
 -1410 ┄──┬─────────────┬─────────────┬─────────────┬────────────┄ -1020
             > >                 ^              <<
               ─          ──────────────   ─   ──
             ──────    ───────────────────────────

Very interesting to hear that different tools do this differently. Would be good to document this, but indeed difficult to figure this out as a user. Maybe calib_software and calib_version are sufficient to encode it implicitly. @MartinHinz and I once had a plan to write a benchmarking paper for different calibration tools to compare exactly these details.

Of course HDR's are more accurate, but also more hard to work with. Colleagues with a desire to operate on this level of temporal accuracy will probably just re-calibrate themselves. And that is fine. As Joe says: calibrated dates are "models, rather than data", so whatever we provide here is essentially only for convenience.

There is the exception of sites where archaeologists created a chronological model and combined multiple lines of evidence to acquire more precise age estimates. In this case it would be better to use the carefully curated ages instead of re-calibrating with the sledge-hammer. Maybe it would make sense to add a field to MIxS-MInAS/extension-radiocarbon-dating for referencing one or multiple publications with more sophisticated age models. Our Varna paper may be an example for that.

So to summarise my suggestions:

Document their drawbacks but keep calib_age_median, calib_age_oldest, calib_age_youngest as they are. They are for convenience and good enough for many applications.
Introduce a field for referencing publications with more advanced age models.

Happy to get disabused about 1. :pray:

If the data does not include a median, then many may just compute a mid-point between the start and end point of whatever range you'll provide for their modelling application, which is arguably worse than the median.

Well, according to Michczyński at least, it's no better. I admit I haven't looked into this myself, but it feels odd to contradict the (only?) published literature in a standard. I'm guilty of using point estimates in published work myself, but at least I knew I was simplfying. My worry here is that by including a field for a point estimate, you are communicating (as you say, to non-experts) that using a point estimate is unproblematic. If there is only the range, that at least gives a hint that there is more to calibrated radiocarbon than a single date.

Very interesting to hear that different tools do this differently. Would be good to document this, but indeed difficult to figure this out as a user.

Yeah, I looked into it while working on https://github.com/joeroe/c14 and https://github.com/joeroe/ruby-radiocarbon. Of course there's the usual elephant in the room: that I have no idea what OxCal is doing.

A benchmark or wider technical comparison paper would be interesting. Let me know if you need any help!

There is the exception of sites where archaeologists created a chronological model and combined multiple lines of evidence to acquire more precise age estimates. In this case it would be better to use the carefully curated ages instead of re-calibrating with the sledge-hammer.

I agree, but at this point are you still talking about a "radiocarbon date"? Is there scope in this standard to separate on the one hand the radiometric data, and on the other chronological models that use that data?

Finally came around and read this little Michczyński paper. A beautiful idea - I wish it was more common to publish little experiments like this. My takeaway from this paper is a bit different from yours, though, and maybe even from the one of the author.

For me Figure 3 shows that these point estimates are usually not that far off from the true age. Good enough as a rough number to, for example, group aDNA samples by millennia. Very interesting to see that the mode, so the maximum of the distribution is performing so well. Michczyński himself (after arguing against any point estimate) concludes:

If it is really essential to use a point estimate for the calendar age of the sample, then the mode (the value of the calendar age that corresponds to the maximum of the probability distribution of a calibrated 14C date) may be accepted as a point estimate, but we should remember that important differences between the mode and the true calendar age of the sample appear for some periods, which are characterized by a specific shape of calibration curve (see Figure 4).

I think I'll add this to currycarbon.

One more tiny comment about what you wrote:

I'm guilty of using point estimates in published work myself, but at least I knew I was simplfying. My worry here is that by including a field for a point estimate, you are communicating (as you say, to non-experts) that using a point estimate is unproblematic.

My experience with (archaeo)geneticists has been that they are usually aware of the uncertainties of radiocarbon ages, although maybe not to the full extent (which I, by no means, could claim for myself :wink:).

I also had in my head that the mode was the way to go; probably also from this paper. For example it's the default of c14::cal_point(). It's been a while since I read it to be honest.

Thinking about it again, as Michczyński says, the probable deviation from the mode is purely a function of the measurement error and the calibration curve. So shouldn't it be possible to come up with a point estimate ± e.g. the 95% margin on either side? I can imagine that'd be more convenient than a range for a lot of applications.

MIxS-MInAS / extension-radiocarbon-dating

Feedback from @joeroe #5