International-Soil-Radiocarbon-Database / ISRaD

Repository for the development and release of ISRaD data and tools
https://international-soil-radiocarbon-database.github.io/ISRaD/
24 stars 15 forks source link

Fraction conventions not standardized #110

Closed jb388 closed 5 years ago

jb388 commented 5 years ago

@aahoyt @coreylawrence @ShaneStoner

20, #19

Data entry for the fraction table is not standardized, and templates have been entered using different conventions. These inconsistencies are a major hurdle for analyzing the fraction data. If we are going to promote ISRaD as a key resource for synthesizing soil fraction data, we need to address these issues ASAP.

1) A prime example is the use of "dummy" layers to represent sequential fractionation schemes, which has been used inconsistently, perhaps because it is not particularly intuitive. To illustrate this, consider the typical three-fraction density fractionation scheme (fPOM, oPOM, MOM). A dummy layer should technically be required to represent the intermediate "heavy" fraction generated when fPOM is separated from bulk soil, because the frc_input for the oPOM and MOM fractions is, e.g. a >1.8 g cm^-3 fraction, rather than the bulk layer itself (which would be the frc_input for the fPOM fraction). If we are going to allow sequential fraction schemes to be entered without dummy layers, we need to explicitly define these special cases and provide examples.

2) There remain inconsistencies in the use of frc_upper and frc_lower because these have not been clearly defined for certain fractionation approaches where the use of these fields are not intuitive (pretty much everything but size and density). See issue #20

3) #16 Certain frc_property values are confusing given our established (but not explicitly stated) conventions, e.g. "carbonate-free" exists, but we have been putting data in the layer table even if samples were first treated to remove carbonates.

4) Are data in the fraction table only from soils? E.g. if DOC or gas samples have been fractionated in some manner, do those data go into the flux/interstitial/incubation tables or the fraction table? I would say they should NOT go into the fraction table, but this is a question that has come up from users, so this should be clarified, i.e. "The fraction table is only for reporting the results of fractionation schemes applied to bulk soil samples", etc.

aahoyt commented 5 years ago

Agreed we should standardize ASAP. It would be great to finalize these decisions and document on the website. Since I have done a lot of website updates recently I'm happy to help get them up there.

1-3. Agreed, and examples would be good in either case for all these. I don't have a strong opinion in which way to go, but can help with documentation

  1. I actually assumed that if DOC or gas samples were fractionated they would go on the fraction tab, since for example the interstitial tab can't really accommodate them in it's current form. However, I would be ok with disallowing this, and just not accommodating this data (for now at least). It is probably relatively rare. Any particular examples you've come across?
olgavinduskova commented 5 years ago

related to 2): At the moment, I am filling out data on fractions treated by acid-base-acid treatment, I use Chem_Extraction for frc_scheme and ABA residual for frc_property and according to Alison´s reaction to issue #20, I should use 0 and 1 as lower and upper cutoff. First of all, is this all correct? Secondly, my suggestion: I feel like since other frc_schemes include "Acid" and "Base", it would be logical to have a specific "Acid-Base-Acid" frc_scheme. And a related question - the dataset contains also "hydrolyzable" fraction, which authors calculated by subtracting the ABA residual from heavy fraction. Should I enter this data? If yes, which frc_property should I use? We have "base soluble" and "acid soluble", but don't have "ABA soluble". Should I again use Chem_Extraction for frc_scheme? Which cutoffs should I use?

Kate-Heckman commented 5 years ago

So I have some ideas about dummy fractions I'd like to briefly express over the next phone call/zoom. I'm going to see if I can make a graphical representation to express the concept. In the past, it seems like conversations about fractions prove difficult because of the complexity of the subject matter.

coreylawrence commented 5 years ago

Sounds like we will talk more about this on the call tomorrow. However, given the complexity, I thought I would sketch out some ideas regarding point 1 above:

First, a definition for "dummy fraction" - a named fraction (i.e., frc_name) in the frc_tab, which is not associated with measurements. To be clear, the term "dummy layer" is used above but in the context of this discussion, I think we are talking about dummy fractions. Dummy layers are related but different. They are primarily used to deal with measurements made on composited samples from several different layers.

Second, my assumption of the two (and I think only two) reasons to include a dummy fraction: (1) To represent a mass of material generated from a fractionation procedure but that is unaccounted for in measurements or the reporting of data. In other words, we want to be able to sum our mass of material back to 100% of the bulk value. Sometimes a portion of mass is calculated by difference (see also issue #). In that case, we could and could (and should?) create a fraction to account for that mass. (2) To allow for reconstruction of a complex fractionation procedure that cannot be reconstructed without the use of a dummy layer. I can't think of a realistic example off the top of my head - I don't think Jeff's example qualifies for reasons described below - but I can imagine a fractionation procedure where there are multiple steps before a measurement. Those steps may or may not actually fractionate the sample but, in either case, may warrant using a dummy layer to fully capture the fractionation procedure. I'd guess this would be a pretty rare occurrence.

Finally, application of this within the existing template framework. My feeling on Jeff's example is that a dummy fraction is not needed because the (non-dummy) fractions sum to 100% of the mass. The main reason for including a dummy fraction, in this case, would be to clearly define the sequence of the steps. With no knowledge of how density fractionations work, it would be impossible to do that without some representation of the sequence. However, I question whether we really need to know the sequence if everything sums to 100%? If we do need to know the sequence, then I think there is a more efficient way of doing it than defining a new dummy-fraction, which I think is harder to train people to do correctly.

I've been trying to work through a few examples for how we might make this work with the existing infrastructure (i.e., providing a vignette to clearly explain a procedure). I think we can do it without adding new columns but it will require changing the options for the frc_input. It strikes me that value of frc_input is more appropriately used to represent the individual parts of a whole. So in Jeff's example, frc_input for FPOM -> bulk1; frc_input for oPOM -> bulk2; frc_input for MOM -> bulk3. Bulk is partitioned into 3 fractions, in the sequence 1, 2, 3. In the event, fPOM and oPOM where removed but not measured, we would need to include "dummy-fractions" to represent that mass but they would be given the sample frc_input value as above.

Getting a little more complicated with this. Lets say that all of the above fractions exist and were measured but, additionally, the MOM fraction was further fractionated through sequential extraction with pyrophosphate, hydroxylamine, and dithionite. Because the MOM step was measured first (in this example), the new frc_input for these 3 additional fractions would be MOM1, MOM2, MOM3 but there would also need to be a dummy layer for the residual material (if it wasn't reported), which would have anfrc_input ->MOM4. In this framework. The frc_input is used to illustrate the number and sequence of the fractions for a given "whole" starting material. The key guiding principle is that the fractions represented account for all of the mass of that starting material. When mass is missing, a dummy fraction is needed to represent it. The frc_input guides the calculation of that missing mass. So MOM4 = MOM-(MOM1+MOM2+MOM3).

Maybe folks could consider this approach in the context of some of the problem datasets and see if it holds up. If so, I can convert this text into a vignette (and include some additional examples).

Kate-Heckman commented 5 years ago

I agree with Corey’s point about the fractions summing to 100%, and his second point about expressing the complexity of a fractionation procedure. I don’t think we need the “bulk1”, “bulk2” etc., but maybe I’m not getting the full context of some of these studies.

I was doing the “expert review” of Susan Crow’s 2015 paper yesterday, and found the need for a dummy fraction (see attached). This particular situation which Corey outlined in his second point is the only case I see that would necessitate a dummy fraction.

coreylawrence commented 5 years ago

With regard to point 4 above, my instinct is to say the gas or doc fractions should be treated separately. However, upon reflection, I don't see why we couldn't build the functionality into the Fraction tab to handle everything. But that would require additional controlled vocabulary and vignettes to explain how to do it correctly. I wonder how much of these types of data are out there?

Kate-Heckman commented 5 years ago

Re: I wonder how much of these types of data are out there? My opinion would be that ISRaD can’t do everything, and special cases like this are probably few and far between.

From: coreylawrence [mailto:notifications@github.com] Sent: Friday, November 30, 2018 10:56 AM To: International-Soil-Radiocarbon-Database/ISRaD ISRaD@noreply.github.com Cc: Heckman, Katherine A -FS kaheckman@fs.fed.us; Comment comment@noreply.github.com Subject: Re: [International-Soil-Radiocarbon-Database/ISRaD] Fraction conventions not standardized (#110)

With regard to point 4 above, my instinct is to say the gas or doc fractions should be treated separately. However, upon reflection, I don't see why we couldn't build the functionality into the Fraction tab to handle everything. But that would require additional controlled vocabulary and vignettes to explain how to do it correctly. I wonder how much of these types of data are out there?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/International-Soil-Radiocarbon-Database/ISRaD/issues/110#issuecomment-443247366, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmcCLoDUfSlYbg0Mb4-mEqPVyduBLC0nks5u0VT-gaJpZM4Yy7V_.

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

jb388 commented 5 years ago

Closing this as it is covered in more detail in #147