dusadrian / DDIwR

DDI with R
15 stars 4 forks source link

Minimum supported DDI Codebook version #5

Closed pitkant closed 4 weeks ago

pitkant commented 4 months ago

Hi, thank you for the useful package.

It is stated in the DDI website that DDIwR is used to "create, edit and validate a DDI Codebook version 2.6, using R script commands.". However, in the package documentation it is hard to find references on which DDI versions the package actually supports.

In getDNS() function in internals.R the package seems to look for version 2.5, otherwise it throws an error:

https://github.com/dusadrian/DDIwR/blob/34dab97091c8af6451910288478a338dc7b3b9c9/R/internals.R#L784-L790

This was a problem when a certain data repository was still using DDI 2.0, which is valid DDI but not recognised as such by the package. However, it is reasonable that the package supports only DDI 2.5, I just wish the package was more transparent about it.

In DDI_Codebook_2.6.R DDIC object is defined, which is used in checkXMList() function in internals.R:

https://github.com/dusadrian/DDIwR/blob/34dab97091c8af6451910288478a338dc7b3b9c9/R/internals.R#L263-L285

When I tried to use getMetadata() function on a DDI-C 2.5 file it threw and error on lines 280-284 because the check does not accept p and extLink fields in my file although they are 2.5 namespace. However, at least in the case of ExtLink there is a deprecation note in the R-file:

https://github.com/dusadrian/DDIwR/blob/34dab97091c8af6451910288478a338dc7b3b9c9/R/DDI_Codebook_2.6.R#L794

However, when I commented this check out the getMetadata() function seemed to produce a sensible result without errors. Also with DDI Codebook 2.6 still being in draft phase (?) this check seems to be too stringent - or that the package only supports version 2.6 could be communicated to the end user more explicitly.

dusadrian commented 4 months ago

Hello @pitkant,

Yes indeed, the DDIwR package is also transitioning towards Codebook 2.6 but, as you rightfully observe, the standard is still not final. Last year, when the 2.6 elements were introduced, I thought the final version would be released by the end of the year (at least this is how it was advertised on the DDI Alliance website). This was not the case, hence the DDIwR package also needs to wait until that is final.

Parts of the code, such as checking the namespace, are also in stand-by mode for the same reason. But I think, the main strength of the package is less about reading a DDI 2.6 Codebook, but to actually produce 2.6 Codebooks using plain R commands.

The versions should not really matter, as the Codebook is specifically engineered to be backwards compatible. If my understanding is correct, elements from version 2.0 should still be part of the Codebook 2.6, even if deprecated. If the (now deprecated) ExtLink is not found in version 2.6, this should be raised to the DDI Alliance Technical Committee, because such an oversight breaks the backwards compatibility.

My intention is to be 100% consistent with the version from the DDI Alliance, however there might be some modifications in the latest development versions of the 2.6 Codebook that I might not have implemented. I think the best is to wait for the very final version before updating the DDIwR package. And in that version, the package will absolutely be fully transparent about what is supported.

I hope this explains the situation, thank you so much for your very helpful review of the code! Best, Adrian

pitkant commented 4 months ago

So I take it that the list object DDIC defined in DDI_Codebook_2.6.R is then generated from codebook.xsd (note: that can be downloaded from DDI-Alliance's Atlassian here) ? The file seems to have some additional mentions of PHRASE Element "ExtLink" and FORM Elements "p" that were not carried over to the DDIC object, but maybe that was the intention.

I agree that if generating 2.6 Codebooks is the goal of this package, then it is probably sufficient. However, if there is the option to read existing codebooks, then some other users might encounter the same problems as I did so just wanted to write out these few paragraphs here for future reference.

dusadrian commented 4 months ago

Indeed. I should also confess that I never understood the "p" elements from the .xsd file. I always thought they are just <p> (HTML paragraphs) that were needed to compile the web page documentation. Probably, thought the same about ExtLink but I will revisit the schema definition to make sure I am not missing anything obvious. The long term goals is to also read Codebook files (all versions). For the time being, producing a DDI 2.6 Codebook seems to me like a very good outcome. Allowing users to take an existing dataset and exporting such a codebook for archiving / publishing purposes has never been so easy since the days of Nesstar. And I am committed to make it better and better. Help is always welcome, even if making the documentation more clear / transparent.

dusadrian commented 1 month ago

Hello Pyry,

I put some more work, and the most recent commit 0.18.2 now seems to support both 2.5 and the draft 2.6 Also, all elements from the latest version of the Codebook 2.6, including the deprecated Link and ExtLink, are now fully incorporated in this package.

Hopefully, this will address this issue. Do please let me know if it solves it.

pitkant commented 4 weeks ago

Thank you very much for your work, the update indeed seems to have solved the issue! I will continue testing the package in my work and ask more questions or provide feedback if need arises.

dusadrian commented 4 weeks ago

Perfect, I will then close this issue. Do please open another if needed.