Improved regression coefficients - Githubissues

durbank / PAIPR

Functions to generate probabilistic estimates of annual accumulation from ice-penetrating radar without the need for manual layer selection or correction

3 stars 0 forks source link

Improved regression coefficients #26

Closed durbank closed 5 years ago

durbank commented 5 years ago

We currently have some issues with regression coefficients not matching well between the 3 validation sites. This was highlighted while investigating divergences in estimated means and trends in the greater WAIS interior between v1.0 and v1.1 of PAIPR. It's possible better regression coefficients in likelihood estimation will fix this.

durbank commented 5 years ago

We may want to switch to using individual pixel values rather than whole layer comparisons. This would give us a larger sample on which to base coefficient estimates. We would need to think about how best to do this.

One way would be to iteratively compare automated layer positions, start with the brightest automated layer (found by summing up all member peak values). Match this layer with it's nearest manual layer (based on SSE?). Then determine true/false membership for each automated pixel based on a conservatively large vertical threshold (perhaps +/-10 cm?). Manual layer pixels corresponding to true automated pixels are removed from future searches.

durbank commented 5 years ago

It might also make sense to scale the integrated brightness-distance by the echogram's median peak prominence. This might minimize the spread in coefficients between validation sites.

durbank commented 5 years ago

It might also make sense to perform a probabilistic analysis on the regression coefficients, where we have MC draws of the two regression parameters with means and st. dev based on the spread in parameter values between validation sites. If this is incorporated, N will likely need to increase to 1,000 at least.

durbank commented 5 years ago

Before anything else, I should reprocess the manual layer picks (at least for SEAT2010-4 and SEAT2010-5 sites). The main reason for this is the truncation of the results at 25 m depth, which results in artificially layers, ultimately leading to poor regression performance.

durbank commented 5 years ago

I have now updated PAIPR to optimize logistic regression parameters based on residuals in age-depth profiles between manual and PAIPR layers, rather than based on the actual layer overlaps. This better adheres to the purpose in PAIPR calibration (generated accurate age-depth scales and eventually SMB estimates).

I'm currently testing the results, but so far all three validation sites seem to produce similar results (this is using the median prominence magnitude and echogram length to scale the DB values). Using these optimized parameters, PAIPR tends to undercount years compared to the cores in all three sites (although not by very much). I'm currently testing with unscaled values to see if that changes things much. The other issue is that with the current optimized parameters, DB values of 0 still have ~25% chance of being selected as annual. Obviously this is no physically correct, and an ideal solution would have DB=0 have a likelihood of 0 as well.

durbank commented 5 years ago

Unscaled values (in meter-prominence units) appear to perform slightly better in terms DB values of 0 (with ~5% chance of being selected as opposed to 25% for optimizations with scaled values). There may be some issues with high sensitivity to starting initial guess, however. Still testing to determine how sensitive the final results are to the starting guess of r and k log reg parameters.

durbank commented 5 years ago

PR #30 closes this issue.