NIST-ISODB / isodb-library

Mirror of the NIST ISODB API (https://adsorption.nist.gov/isodb)
12 stars 3 forks source link

Found duplicates with capital/lower letters in the DOI #2

Open danieleongari opened 4 years ago

danieleongari commented 4 years ago

We should handle with care capital/lower letters: https://adsorption.nist.gov/isodb/api/biblio/10.1021/ie902008g.json

The first 5 are by digitizer UNKNOWN at date 1000-01-01, while isotherm 6-10 are by Toyosi Afolabi on 2017-07-28 who probably didn't notice the previous entry and resubmitted the data. Collected data are sligtly different, therefore they were digitized twice image

danieleongari commented 4 years ago

Moreover, in both the digitization processes, some isotherms were not correctly intended as a mixture isotherm obtained from the IAST model. Therefore, the same wrong interpretation was made twice! See Figure 4 of https://pubs.acs.org/doi/full/10.1021/ie902008g.

But this is another story...

dwsideriusNIST commented 4 years ago

I took a look at that data set. First, there are reasons that I won't discuss here to put that data set in doubt. Second, at least for Figure 4, the entire data set needs to be redone as it contains a multicomponent system, with two different "model" isotherms, as well as experimental data from breakthrough experiments. None of it is single-component, and all the isotherms for that paper in the DB are single-component!

But there is also ambiguity in the source data. According to the figure caption, there are three binary models: a) solid line: Langmuir multicomponent based on single-component source isotherms (eq 3); b) thick dashed: IAST [probably also based on the Langmuir fits of single-component isotherms, but not actually specified]; c) thin-dashed Langmuir model based on fits to the experimental breakthrough isotherms. Please correct me, but I only see two models in Panel a. It appears to me that (c) was not included, as I see no thin dashed line.

So there is actually reason to have two nearly identical isotherms in the data set for that paper. Figure 4 should have yielded:

1) binary experimental isotherm [only 4 points; lowest pressure point is only given for CO2, so it would be skipped] 2) binary model isotherm for Langmuir binary model, based on single-component 3) binary model isotherm for IAST (probably based on single-component Langmuir fits)

danieleongari commented 4 years ago

Thank you very much for elaborating on this example. Now that I'm working a bit on the database for my projects I will open here issues as soon as I find any unexpected entry, to see if they can be solved with some checking that can also spot similar problems, or they are indeed very tricky cases that require some subjective judgement.

Don't feel obliged to go in details through all the issues that I report because I understand that this can be very time consuming and not much rewarding! But thanks for giving me a full explanation in this case, I appreciated.

BTW, I'm not familiar with multicomponent isotherms but your comment made me realize how much care we should put into it and it is fine to me to start for the moment by focusing on single component ones. The isotherm from Figure2 of that reference (CO2, CH4, CO) are actually single component, so these are fine, and we would need to discard the others. In the case of Figure 2 do you have a preference to report it twice (experimental-markers, Langmuir-line) or reporting only markers is ok? I think digitizing also the Langmuir fit would be redundant, but maybe your interns would also digitize both.

image

dwsideriusNIST commented 4 years ago

I'll write extended explanations for some of the first cases or if a discussion is helpful.

1) this is a good example of how intricate multicomponent data sets can be. And believe me, this is one of the simplest cases I can imagine. 2) If there is experimental data AND a model-based fit, I only require students to digitize the experimental data. Early on, we included the model-based fit as well, but it should have been categorized correctly as "model." Not 100% as you noticed.

FYI: the "model" category is a catch-all for isotherms that are derived from measurements, but are not the measurements themselves. Basically, anything that is not experimental, simulation, or (very rarely) ab initio / quantum.