Compatibility with R version 3.6.*

SomaLogic / SomaDataIO

The SomaDataIO package loads and exports 'SomaScan' data via the 'SomaLogic Operating Co., Inc.' proprietary data file, called an ADAT ('*.adat'). The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.

https://somalogic.github.io/SomaDataIO/

Other

26 stars 18 forks source link

Compatibility with R version 3.6.* #23

Closed jmw86069 closed 7 months ago

jmw86069 commented 2 years ago

Checks

[X] Please search previous issues before creating a new one, to ensure yours is not a duplicate
[X] The proposed feature fits in the scope of SomaScan data I/O or low level data soma_adat manipulation

Problem

I respect the decision to go with R v4 compatibility, but wondered if you might still consider R-3.6 compatibility? I admit I noticed only one dependency on R v4 (the use of |>), so it seemed relatively straightforward. I could be missing something you have in plans, due respect. My alternative is to create a fork to do the same.
Our situation is that several project analyses are using the R-3.6 codebase for consistency while progressing manuscripts for publication.
It's certainly possible to use a separate instance of R v4 to load ADAT files, save in a format readable by R v3.6, and continue. Or to use R v4 to edit and save ADAT files to submit to GEO for example.

Feature

From my very brief testing, all instances of |> can be replaced with %>%, which already imported via the dependency on the dplyr package. I'm not aware of another aspect of R code that requires R v4 or higher - I could be missing something however.

R code

I don't have R code, but did clone the repository, global find/replace to use %>%, and read_adat() worked properly.

I admit it's in category of "nice to have". :) Thanks for the nice package dev, I'm sure to be using it quite a bit.

Priority Level

[ ] High
[X] Medium
[ ] Low

Thanks for contributing :partying_face:!

stufield commented 2 years ago

Hi James,

Thanks for your question and for using SomaDataIO. Please excuse the late response but I've been out of town.

To your question about backward compatibility to older versions of R v3.6.*. Unfortunately there are three problems with maintaining backward compatibility to older versions of R:

The SomaDataIO package itself is composed of a subset of SomaLogic's internal analysis code base, which is periodically fixed to the current R version at approx. 6 month intervals. Breaking out SomaDataIO from this ecosystem would require maintaining 2 (almost) identical code bases, which is undesirable.
We are currently planning on publishing SomaDataIO (perhaps under a different name) to CRAN, which requires all checks and unit tests to pass under the current released version of R.
The decision to move away from the magrittr pipe (%>%) in favor of the native R pipe (|>) was made to reduce external dependencies and integrate better with the base R ecosystem. So reversing that decision is also undesirable.

So unfortunately I do not foresee a backward compatible version of SomaDataIO at this time. However, your suggestion might provide a alternate solution. Since you've already forked the repository, perhaps creating a branch with the necessary pipe adjustments and DESCRIPTION file updates would allow others in your position to move forward with their analyses. This would be a "dead end" branch that would never be merged, and the primary drawback would be it would no longer be actively developed/debugged, but it would be frozen in time for R v3.6.*. We could also tag it appropriately to facilitate automated pipelines. If this alternative might work for you, I'd have to create a branch for you to submit a PR against. Thoughts?

jmw86069 commented 2 years ago

I respect your reasoning.

My main goal is to submit data to GEO. Our ADAT files contain a number of sample annotation fields not appropriate for GEO, so I need to remove some annotations, and add others that are helpful for downstream analysis. Thus far, this is the only package that can export ADAT files.

I understand the point about support and maintenance, those are driving reasons. It would add a layer of support burden to test R-3.6. and R-4., I respect that. I don't think allowing R-3.6.1 compliance would break any R-4.* tests, although you would know better than I.

I agree with the goal of reducing R dependencies, I do that too for what it's worth. However SomaDataIO widely uses dplyr, which itself still imports magrittr. I didn't see other R-4.* specific syntax, such as dynamic functions for example: \(x).

My main counterpoint is that in principle, for me anyway, a data import package shouldn't require the most recent version of R. Data access should be as broadly available as possible. This is not a strong opinion, I could be missing something, there are exceptions and all that. :) Maybe this package should really only be for R-4.*, that's okay too.

(But it almost already works in R-3.6.1!)

Anyway, if that was also not persuasive, that's okay too. I appreciate your suggestion. If I understand right, essentially make a PR with the changes I made for R-3.6.1 compatibility, into a branch specific for R-3.6.1? For a fixed file format (for now anyway) this release might have some longevity. And certainly better to use a package under your Github than my own.

stufield commented 2 years ago

I think, at least philosophically, we are in agreement. Vis-a-vis reduced dependencies, principle of an IO package not requiring specific software version, etc. The tricky part is how to achieve it in the context of two separate code bases with minimal maintenance overhead.

One idea I've considered in the past is re-writing much of the code to be version agnostic (both internally and externally), e.g. simply avoid using pipes altogether! I think the advantage of piping is more visual at the user level, but for internal code that the user never sees, may be an unnecessary complexity.

But to be honest I haven't looked too deeply into it to determine what kind of investment it would take to get us there. To that end, yes you understand me correctly, we could at least start the process (no promises!) with a PR that removes pipes altogether and becomes R version agnostic but is still written in base R (no %>% either). This would:

give me a better idea of the scope of the changes
satisfy your immediate needs
offer a future potential path if we ultimately decide to merge to main

Does that sound like a reasonable path forward?

jmw86069 commented 2 years ago

I hear you, it's a similar thought process for a lot of R packages - whether to depend upon another package, or rewrite code to less convenient syntax that avoids the dependency. I think dplyr/magrittr are fairly widely used, so it doesn't feel extravagant to require those packages.

By comparison, some packages feel extravagant, requiring 20+ packages that aren't related to the core functions I want to import. I use SummarizedExperiment as my core expression data type, and it legit takes 5-10 seconds just to require the package, sometimes loading the package is the slow step in an Rmarkdown. It's pretty heavy, but useful. I digress. Haha.

I don't think I'm in position to rewrite pipes %>% or |> to non-pipe alternatives, if I'm understanding your suggestion. I was able to convert to %>% and confirm it still works the way I think you intended.

I think the question is whether your plan still includes using dplyr functions. If yes, then there isn't much benefit to removing %>% pipes, since you still depend on magrittr. If you also want to remove the dplyr dependency, that could work but is probably beyond my ability to help.

Aside: I just chased down some differences in Adat files, discovered you change "," to ";" in the gene annotation data. :) Haha. I noticed you update CreatedBy and CreatedByHistory, I respect keeping track of who makes any edits. I'm not sure it's ideal to include usernames. In both cases I suggest adding a little help doc sentence, I think they happen in prepHeaderMeta(). Sorry for hijacking my own issue. haha

stufield commented 10 months ago

Sorry it's been a while, but closing out this issue. Couple things:

prepHeaderMeta() no longer zaps commas and replaces with ";". It simply writes what is initially, which should result in a higher fidelity to the original ADAT. However, this may result in some funky output when writing to *.csv (as some users do), which is why this was originally introduced.
- See: https://github.com/SomaLogic/SomaDataIO/blob/main/R/prepHeaderMeta.R#L59
The CreatedBy and CreatedByHistory entries were greatly simplified to (attempt to) avoid confusion. However, I ultimately decided to keep the user included on output for traceability.
- See: https://github.com/SomaLogic/SomaDataIO/blob/main/R/prepHeaderMeta.R#L81

stufield commented 7 months ago

Closing this issue now as this fork contains the conversion from |> -> %>% necessary for backward compatibility to R 3.6.x