andrewcparnell / Bchron

A Bayesian radiocarbon chronology model with R
http://andrewcparnell.github.io/Bchron/
36 stars 10 forks source link

Changes in Bchronology in past year or so? #25

Closed cmbarton closed 3 years ago

cmbarton commented 3 years ago

The reason for my recent issues posted is that I'm working with a doctoral student who is using Bchron for chronology at some really interesting old archaeological sites. He is revising his dissertation and getting different results now compared to what he did a year ago. Some of this might be due to changing defaults in predictPositions and maybe how he is specifying it now. But there may be other changes in recent versions I am unaware of. I wanted to give him some of my code from a couple years ago, and plots you helped me beautify. But they did not run, prompting me to try to find out why. (e.g., I found that I now need to include jitterPositions = T or I get a strange error, and have learned that I now need to use ggplot on the output).

The main issue for the student is that the age-depth model curve was pulled closer to his dates last year than it is now. I will attach a couple of examples. Beyond predictPositions, does one of the other more esoteric arguments to this module affect the 'smoothness' of the age-depth model curve? Any insights you might have would be helpful.

His current code is:

TYOut4_20_NoTop = with(TYAllUnitsforBchron4_NoTop, Bchronology(ages=ages, ageSds=ageSds, calCurves=calCurves20, positions=position, positionThicknesses=thicknesses, ids=id, predictPositions=seq(10,85,by=5), jitterPositions = TRUE))

Here is the plot results he is getting now

image

Here is the plot results he got a year ago with the same data

image

andrewcparnell commented 3 years ago

Hi Michael. Without seeing the data I'm afraid I can't reproduce those plots or see what the issue is. See e.g. here for how to write a good one (the trick is using dput).

The main thing that's changed recently is the way that Bchron computes starting values for calibrated dates in cores with large numbers of outliers (like yours). I actually don't like either of the above plots. Any run where it's essentially suggesting that every single date is an outlier is unlikely to be correct, the algorithm might be getting stuck.

In these kinds of cores you need to think carefully about which dates are more or less likely to be outliers and input that into the model, possibly also suggesting starting values. No amount of beautiful maths is going to solve messy data problems! It needs prior information.

Andrew

cmbarton commented 3 years ago

Andy,

I'm copying Tim (student I mentioned) in case he can send you the data. He did an outlier analysis and changed some of the values without result. Most of the dates do not look like outliers to me but rather indicate changes in deposition rates. These dates are from excavations in cave/rockshelter sediments, where deposition rates and processes can vary greatly over time. The age-depth model is useful in estimating such rates in order to calibrate artifact densities per unit time.

Michael

On May 14, 2021, at 3:49 AM, Andrew Parnell @.***> wrote:

Hi Michael. Without seeing the data I'm afraid I can't reproduce those plots or see what the issue is. See e.g. here http://adv-r.had.co.nz/Reproducibility.html for how to write a good one (the trick is using dput).

The main thing that's changed recently is the way that Bchron computes starting values for calibrated dates in cores with large numbers of outliers (like yours). I actually don't like either of the above plots. Any run where it's essentially suggesting that every single date is an outlier is unlikely to be correct, the algorithm might be getting stuck.

In these kinds of cores you need to think carefully about which dates are more or less likely to be outliers and input that into the model, possibly also suggesting starting values. No amount of beautiful maths is going to solve messy data problems! It needs prior information.

Andrew

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/andrewcparnell/Bchron/issues/25#issuecomment-841168452, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACENLL2OAOAFE5NXRNOXFLDTNT54NANCNFSM443D26JQ.

cmbarton commented 3 years ago

Hi Michael,

Thanks for putting me in touch with Andy. I took your advice and used a smaller predictPositions interval (1cm from 1 to 100 cmbs) but it still doesn't resemble what I got last year.

Andy, I really appreciate any help you can provide. I'll paste my data here (at least, what I got by using dput on my dataset - apologies if it's messier than it needs to be, I haven't used it before) followed by the code I used on it, and finally the plot resulting from that script.

Before I get to that though, I wanted to address what you said about outliers. To be honest, the only way I've been identifying outliers is to use Bchron's summary(x, 'outliers') command, but since that relies on the very Bchronology run that I seem to be having trouble with, I'm not sure it's a good way to do that. I was also under the impression that Bchronology ignored any dates it deemed an outlier, but it seems that removing them manually yields pretty different results, so I may be misunderstanding something.

Here's the code:

library(Bchron) library(ggplot2)

structure(list(X1 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), id = c("TY-7-5", "TY-1-4", "TY-7-8", "TY-5/6-3", "TY-7-10", "TY-1-5", "TY-7-11", "TY-1-6", "TY-7-12a", "TY-7-12b", "TY-7-13", "TY-5/6-6a", "TY-5/6-11", "TY-5/6-6b", "TY-5/6-12", "TY-7-14", "TY-1-7", "TY-5/6-13", "TY-5/6-14", "TY-7-18a", "TY-7-18b"), ages = c(1885, 1745, 2020, 2010, 1120, 1170, 7910, 9560, 8960, 10060, 8765, 10359, 10250, 8507, 10215, 10375, 10412, 10255, 10355, 13660, 13845), ageSds = c(20, 15, 20, 20, 15, 20, 20, 35, 30, 25, 25, 41, 30, 35, 30, 25, 42, 35, 35, 70, 35), position (cmbd) = c(85, 92, 96, 105, 107, 108, 111, 116, 119, 119, 120, 122, 123, 123, 128, 130, 133, 134, 143, 151, 151), position = c(14, 21, 25, 34, 36, 37, 40, 45, 48, 48, 49, 51, 52, 52, 57, 59, 62, 63, 72, 80, 80), thicknesses = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), calCurves = c("intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13", "intcal13"), calCurves20 = c("intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20", "intcal20"), Date = c("TY-7-5", "TY-1-4", "TY-7-8", "TY-5/6-3", "TY-7-10", "TY-1-5", "TY-7-11", "TY-1-6", "TY-7-12a", "TY-7-12b", "TY-7-13", "TY-5/6-6a", "TY-5/6-11", "TY-5/6-6b", "TY-5/6-12", "TY-7-14", "TY-1-7", "TY-5/6-13", "TY-5/6-14", "TY-7-18a", "TY-7-18b"), OutlierProb2020 = c(1, 0.013, 0.031, 0.487, 1, 1, 0.008, 1, 0.871, 1, 0.154, 1, 0.677, 1, 0.007, 0.014, 0.009, 0.9, 0.945, 0.073, 0.273)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L), spec = structure(list( cols = list(X1 = structure(list(), class = c("collector_double", "collector")), id = structure(list(), class = c("collector_character", "collector")), ages = structure(list(), class = c("collector_double", "collector")), ageSds = structure(list(), class = c("collector_double", "collector")), position (cmbd) = structure(list(), class = c("collector_double", "collector")), position = structure(list(), class = c("collector_double", "collector")), thicknesses = structure(list(), class = c("collector_double", "collector")), calCurves = structure(list(), class = c("collector_character", "collector")), calCurves20 = structure(list(), class = c("collector_character", "collector")), Date = structure(list(), class = c("collector_character", "collector")), OutlierProb = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))

TYOut4_20_NoTop_1cm_1to100 = with(TYAllUnitsforBchron4_NoTop, Bchronology(ages=ages, ageSds=ageSds, calCurves=calCurves20, positions=position, positionThicknesses=thicknesses, ids=id, predictPositions=seq(1,100,by=1),jitterPositions = TRUE))

plot(TYOut4_20_NoTop_1cm_1to100, dateHeight = 5,chronTransparency = 0.50) + labs(x = 'Age (years BP)', y = 'Depth below surface (cm)', title = 'TYOut4_20_NoTop_1cm_1to100 Chronology')

[image: image.png]

On Fri, May 14, 2021 at 1:37 PM Michael Barton @.***> wrote:

Andy,

I'm copying Tim (student I mentioned) in case he can send you the data. He did an outlier analysis and changed some of the values without result. Most of the dates do not look like outliers to me but rather indicate changes in deposition rates. These dates are from excavations in cave/rockshelter sediments, where deposition rates and processes can vary greatly over time. The age-depth model is useful in estimating such rates in order to calibrate artifact densities per unit time.

Michael

On May 14, 2021, at 3:49 AM, Andrew Parnell @.***> wrote:

Hi Michael. Without seeing the data I'm afraid I can't reproduce those plots or see what the issue is. See e.g. here http://adv-r.had.co.nz/Reproducibility.html for how to write a good one (the trick is using dput).

The main thing that's changed recently is the way that Bchron computes starting values for calibrated dates in cores with large numbers of outliers (like yours). I actually don't like either of the above plots. Any run where it's essentially suggesting that every single date is an outlier is unlikely to be correct, the algorithm might be getting stuck.

In these kinds of cores you need to think carefully about which dates are more or less likely to be outliers and input that into the model, possibly also suggesting starting values. No amount of beautiful maths is going to solve messy data problems! It needs prior information.

Andrew

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/andrewcparnell/Bchron/issues/25#issuecomment-841168452, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACENLL2OAOAFE5NXRNOXFLDTNT54NANCNFSM443D26JQ .

cmbarton commented 3 years ago

Andy,

Here's the data in more easily used csv format.

Michael

TYAllUnitsforBchron4_NoTop.csv

cmbarton commented 3 years ago

I solved the mystery. The problem is a change in how Bchronology deals with unit thickness. Tim says that he always set positionThicknesses = 0. This is also the default value in v. 4.7.5 ("positionThicknesses = rep(0, length(ages)),").

But the results are VERY different if the thicknesses are all 0 vs. all 10. The latter is almost identical to what he got previously vs. what he is getting now. See below.

Current default value: positionThicknesses = rep(0, length(ages)) TC_shelter1_0thick

Setting all thicknesses to a non-zero value: positionThicknesses = rep(10, length(ages)) TC_shelter1_10thick

I set his thicknesses to a value = vertical difference between dating samples. Here is the result. TC_shelter1

cmbarton commented 3 years ago

A quick update. If all thicknesses are set to 1 or even 0.1, Bchronology produces the same curve as all thicknesses =10. So the problem is with thicknesses = 0

andrewcparnell commented 3 years ago

Thanks for spotting this issue. I've run out of time to look at it this week, but I'll try and get back to it ASAP.

Andrew

cmbarton commented 3 years ago

Glad I could finally find out where the problem is. It was a weird mystery.

andrewcparnell commented 3 years ago

Right. Think I've finally got to the bottom of this. It was an easy bug to find but a devil to fix. The new version of Bchron on GitHub should hopefully fix these problems - please give it a try and let me know how you get on. (Let me know if you need help installing it).

The problem, if you're interested, was that when you have repeated positions and zero thicknesses, the stochastic process behind Bchron will fail. For that reason Bchron 'jittered' the depths/positions in those situations to stop the algorithm crashing. However there was a bug that was setting these values way too large resulting in enormous thickness errors and very odd chronologies. If the extra artificial thickness was too small, the algorithm failed anyway due to numerical underflow. I think I've managed to find an automatic workaround and there's a new argument artificialThickness that enables the user to set this by hand if they still run into problems.

Many, many thanks for finding this and I hope the new fix works for you. If not please get back to me and I'll keep going.

Andrew

cmbarton commented 3 years ago

Thanks very much. I'll test today or tomorrow and let you know.

cmbarton commented 3 years ago

Andy, Just tested this. It runs fine and produces expected results with positionThicknesses = rep(0, length(ages)) and leaving artificialThickness at the default of 0.01. I guess jitterPositions has been depreciated. If I push it by also setting artificialThickness=0, it seems to go into a forever loop. Interestingly artificialThickness=.001 works but gives somewhat different output, and .001 works and gives output that looks identical to .01.

What is positionThickness anyway? You have a sample that has a vertical "position", age, and SD. Why is thickness needed anyway?

It's true that in many archaeology projects, we excavate by some kind of stratigraphic (or arbitrary) levels. And indeed a sediment core might be divided into stratigraphic units too. Is positionThickness supposed to refer to the thickness of such stratigraphic units? If so, then what is position supposed to be? Depth of the sample or depth of the center of the unit? Depositional units accumulate over time and samples at different positions within a unit could help provide a within-unit model. In that case, what does positionThickness mean?

What I am getting at there seem to be 2 different cases

Case 1. Position is the centroid of a depositional unit (e.g., layer in a site or in a core) and positionThickness is the thickness of each unit. There will be a data column of positionThicknesses to read from. Dates from within each unit should be combined into a single date (eg., BchronDensity) and the model will work with the mean/median age+SD of the unit.

Case 2. Position is the vertical depth of each sample (not a depositional unit). "positionThickness" is simply the distance between samples (or between the midpoint between a sample and a lower sample, and the midpoint between the sample and a higher sample) and calculated automatically. No need for a thickness data column. If 2 samples have identical positions (depths) and this is a problem for the model, then either 1) combine the sample ages or 2) vertically jitter them with what you are now calling artificialThickness (maybe better called "jitterPositions" as before, but taking a value rather than T/F).

What do you think? I don't know if I explained this well or not but can try to clarify if you have questions.

andrewcparnell commented 3 years ago

OK thanks. In answer to your comments:

Andrew

cmbarton commented 3 years ago

Thanks Andrew. Some responses.

OK thanks. In answer to your comments:

  • If you have some identical depths, zero thicknesses, and artificialThickness set to zero then Bchron should crash (I've just updated it to do so).
  • I've also updated the documentation to be a bit clearer. The position or depth value should be a measurement of the middle of a slice, and the thickness should be the full height of that slice. Thickness is needed because you haven't got an age of an exact timepoint but rather an estimate of that age over the slice that it covers. It's especially important I would guess if you're dating bulk sediment.
  • You can certainly have multiple ages from a slice where there is a positive thickness (I think this is case 1 above?) where you shouldn't need to do any combination of dates. But Bchron will run into problems when there are multiple dates from an infinitesimally thin slice. You're essentially saying that there are two possible ages for that depth in the core, which can't be correct unless one of them is an outlier.

The situation can be more complicated. For example, 2m of visually undifferentiated sediment with 10 dates at different depths. The sediment might have been excavated in arbitrary 10cm vertical slices, but these are not depositional units (e.g., in a core of varves). A date does not pertain to a 10cm arbitrary level. So a position thickness of 10cm is pretty meaningless. Another example is a deposit that does have distinct sedimentary units, some of which are quite thick (e.g., 50cm). There might be 3 samples recovered from a 50cm thick layer: one near the top, one near the bottom, and one in the middle. Recording their depths below surface and calculating a model based on those depths can help understand the length of time and sedimentation rate for that level. Assuming that all three are at the midpoint of a 50cm thick level is incorrect. Finally, a single sample can be subsampled and dates run on each subsample. The 2 subsamples have exactly the same depth but provide 2 different (hopefully similar) estimates of the age of the sample. In this latter example, bchron might need a tiny vertical 'jitter' to work perhaps. I recently review a paper for PLOSONE in which both of the last 2 examples occurred.

  • Case 2 is not correct as far as I can understand it. The thickness should always refer to the 'slice' that was sent for dating. It shouldn't be related to differences between depths.

The first example above is relevant here. For 10 dates extracted at different depths from an undifferentiated 2m of sediment, what does thickness (artificial or otherwise) mean?

Andrew

andrewcparnell commented 3 years ago

OK - I'm not sure how much more we can do over text comments - it might be worth having a video chat soon to go through in detail? As always if there's something that's not working for you in Bchron I'm happy to re-write it as appropriate. I'm also not at all expert in the practical matters of coring so happy to take your advice on this.

Just to clarify - by slice here I mean the slice that was sent for dating, rather than any arbitrary slicing up of the core. So if you sent some bulk sediment from a 10cm slice at depth 2m, Bchron assumes that you that the 14C age you get back is equally likely to have occurred at 2.05m to 1.95m.

Conversely, If you took a 50cm layer and sent three parts of it for dating (say top middle and bottom) then you should have a depth and a thickness for each of the parts that were sent for dating; the 50cm is irrelevant.

Again, if that's not clear, or doesn't match with the way you think the dating should work let me know and we'll arrange a proper chat.

Andrew

cmbarton commented 3 years ago

You are right. There is only so much that can be done through email/chat.

I'm not trying to be a pain but to help if I can. You are also right about bulk soil dating. I am thinking more of cases where samples of charcoal or bone are dated, much more common than bulk soil dates for archaeology and paleoecology in archaeological contexts. BChronology works fine and can be adapted to both types of cases.

I'm just not sure what thickness means for the 2nd type of case. For a given set of charcoal samples, pulled from a strat profile at different depths, does it make any difference to the model if thickness is 10cm vs. thickness is 0.1cm?

Michael

On Jun 9, 2021, at 3:50 AM, Andrew Parnell @.***> wrote:

OK - I'm not sure how much more we can do over text comments - it might be worth having a video chat soon to go through in detail? As always if there's something that's not working for you in Bchron I'm happy to re-write it as appropriate. I'm also not at all expert in the practical matters of coring so happy to take your advice on this.

Just to clarify - by slice here I mean the slice that was sent for dating, rather than any arbitrary slicing up of the core. So if you sent some bulk sediment from a 10cm slice at depth 2m, Bchron assumes that you that the 14C age you get back is equally likely to have occurred at 2.05m to 1.95m.

Conversely, If you took a 50cm layer and sent three parts of it for dating (say top middle and bottom) then you should have a depth and a thickness for each of the parts that were sent for dating; the 50cm is irrelevant.

Again, if that's not clear, or doesn't match with the way you think the dating should work let me know and we'll arrange a proper chat.

Andrew

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/andrewcparnell/Bchron/issues/25#issuecomment-857593248, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACENLL2YTZIH5U5LNLEUDV3TR5BP7ANCNFSM443D26JQ.

andrewcparnell commented 3 years ago

Still struggling to get my head around case 2:

Position is the vertical depth of each sample (not a depositional unit). "positionThickness" is simply the distance between samples (or between the midpoint between a sample and a lower sample, and the midpoint between the sample and a higher sample) and calculated automatically. No need for a thickness data column.

So you've dated at regular depths, date 1 at 10cm, date 2 at 20cm, etc, down the core. The positionThicknesses should then be the thickness of the layer that was sent for dating. Using a value for the thickness of, e.g. 10cm, would mean that date1 could represent the age of any depth between 5cm and 15cm for example, which wouldn't make sense if only a 1cm thick layer wasn't sent for dating.

But perhaps I'm not understanding this at all?!

cmbarton commented 3 years ago

I'm not explaining this clearly it seems. Maybe it would be a good idea to have a chat.

Can I send you a Zoom link? Where are you geographically so we can work out a time? I'm in Arizona, which doesn't do Daylight Savings Time. So we are the same as California now.

Michael


C. Michael Barton Director, Center for Social Dynamics & Complexity Director, Network for Computational Modeling in Social & Ecological Sciences Associate Director, School of Complex Adaptive Systems Professor, School of Human Evolution & Social Change Arizona State University Tempe, AZ 85287-2402 USA

voice: 480-965-6262 (SHESC), 480-965-8130/727-9746 (CSDC) fax: 480-965-7671(SHESC), 480-727-0709 (CSDC) www: http://shesc.asu.edu, https://complexity.asu.edu, http://www.public.asu.edu/~cmbarton

On Jun 14, 2021, at 5:56 AM, Andrew Parnell @.**@.>> wrote:

Still struggling to get my head around case 2:

Position is the vertical depth of each sample (not a depositional unit). "positionThickness" is simply the distance between samples (or between the midpoint between a sample and a lower sample, and the midpoint between the sample and a higher sample) and calculated automatically. No need for a thickness data column.

So you've dated at regular depths, date 1 at 10cm, date 2 at 20cm, etc, down the core. The positionThicknesses should then be the thickness of the layer that was sent for dating. Using a value for the thickness of, e.g. 10cm, would mean that date1 could represent the age of any depth between 5cm and 15cm for example, which wouldn't make sense if only a 1cm thick layer wasn't sent for dating.

But perhaps I'm not understanding this at all?!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/andrewcparnell/Bchron/issues/25#issuecomment-860662101, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACENLLY4QQPBSCJN7627AUTTSX4ABANCNFSM443D26JQ.

andrewcparnell commented 3 years ago

OK - I'll contact you over email to arrange so that the whole of GitHub doesn't join our meeting! Andrew