SomaLogic / SomaDataIO

The SomaDataIO package loads and exports 'SomaScan' data via the 'SomaLogic Operating Co., Inc.' proprietary data file, called an ADAT ('*.adat'). The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.
https://somalogic.github.io/SomaDataIO/
Other
26 stars 18 forks source link

Compatibility with R version 3.6.* #23

Closed jmw86069 closed 7 months ago

jmw86069 commented 2 years ago

Checks

Problem

Feature

R code

I don't have R code, but did clone the repository, global find/replace to use %>%, and read_adat() worked properly.

I admit it's in category of "nice to have". :) Thanks for the nice package dev, I'm sure to be using it quite a bit.

Priority Level


Thanks for contributing :partying_face:!

stufield commented 2 years ago

Hi James,

Thanks for your question and for using SomaDataIO. Please excuse the late response but I've been out of town.

To your question about backward compatibility to older versions of R v3.6.*. Unfortunately there are three problems with maintaining backward compatibility to older versions of R:

  1. The SomaDataIO package itself is composed of a subset of SomaLogic's internal analysis code base, which is periodically fixed to the current R version at approx. 6 month intervals. Breaking out SomaDataIO from this ecosystem would require maintaining 2 (almost) identical code bases, which is undesirable.
  2. We are currently planning on publishing SomaDataIO (perhaps under a different name) to CRAN, which requires all checks and unit tests to pass under the current released version of R.
  3. The decision to move away from the magrittr pipe (%>%) in favor of the native R pipe (|>) was made to reduce external dependencies and integrate better with the base R ecosystem. So reversing that decision is also undesirable.

So unfortunately I do not foresee a backward compatible version of SomaDataIO at this time. However, your suggestion might provide a alternate solution. Since you've already forked the repository, perhaps creating a branch with the necessary pipe adjustments and DESCRIPTION file updates would allow others in your position to move forward with their analyses. This would be a "dead end" branch that would never be merged, and the primary drawback would be it would no longer be actively developed/debugged, but it would be frozen in time for R v3.6.*. We could also tag it appropriately to facilitate automated pipelines. If this alternative might work for you, I'd have to create a branch for you to submit a PR against. Thoughts?

jmw86069 commented 2 years ago

I respect your reasoning.

My main goal is to submit data to GEO. Our ADAT files contain a number of sample annotation fields not appropriate for GEO, so I need to remove some annotations, and add others that are helpful for downstream analysis. Thus far, this is the only package that can export ADAT files.

I understand the point about support and maintenance, those are driving reasons. It would add a layer of support burden to test R-3.6. and R-4., I respect that. I don't think allowing R-3.6.1 compliance would break any R-4.* tests, although you would know better than I.

I agree with the goal of reducing R dependencies, I do that too for what it's worth. However SomaDataIO widely uses dplyr, which itself still imports magrittr. I didn't see other R-4.* specific syntax, such as dynamic functions for example: \(x).

My main counterpoint is that in principle, for me anyway, a data import package shouldn't require the most recent version of R. Data access should be as broadly available as possible. This is not a strong opinion, I could be missing something, there are exceptions and all that. :) Maybe this package should really only be for R-4.*, that's okay too.

(But it almost already works in R-3.6.1!)

Anyway, if that was also not persuasive, that's okay too. I appreciate your suggestion. If I understand right, essentially make a PR with the changes I made for R-3.6.1 compatibility, into a branch specific for R-3.6.1? For a fixed file format (for now anyway) this release might have some longevity. And certainly better to use a package under your Github than my own.

stufield commented 2 years ago

I think, at least philosophically, we are in agreement. Vis-a-vis reduced dependencies, principle of an IO package not requiring specific software version, etc. The tricky part is how to achieve it in the context of two separate code bases with minimal maintenance overhead.

One idea I've considered in the past is re-writing much of the code to be version agnostic (both internally and externally), e.g. simply avoid using pipes altogether! I think the advantage of piping is more visual at the user level, but for internal code that the user never sees, may be an unnecessary complexity.

But to be honest I haven't looked too deeply into it to determine what kind of investment it would take to get us there. To that end, yes you understand me correctly, we could at least start the process (no promises!) with a PR that removes pipes altogether and becomes R version agnostic but is still written in base R (no %>% either). This would:

  1. give me a better idea of the scope of the changes
  2. satisfy your immediate needs
  3. offer a future potential path if we ultimately decide to merge to main

Does that sound like a reasonable path forward?

jmw86069 commented 2 years ago

I hear you, it's a similar thought process for a lot of R packages - whether to depend upon another package, or rewrite code to less convenient syntax that avoids the dependency. I think dplyr/magrittr are fairly widely used, so it doesn't feel extravagant to require those packages.

By comparison, some packages feel extravagant, requiring 20+ packages that aren't related to the core functions I want to import. I use SummarizedExperiment as my core expression data type, and it legit takes 5-10 seconds just to require the package, sometimes loading the package is the slow step in an Rmarkdown. It's pretty heavy, but useful. I digress. Haha.

I don't think I'm in position to rewrite pipes %>% or |> to non-pipe alternatives, if I'm understanding your suggestion. I was able to convert to %>% and confirm it still works the way I think you intended.

I think the question is whether your plan still includes using dplyr functions. If yes, then there isn't much benefit to removing %>% pipes, since you still depend on magrittr. If you also want to remove the dplyr dependency, that could work but is probably beyond my ability to help.

Aside: I just chased down some differences in Adat files, discovered you change "," to ";" in the gene annotation data. :) Haha. I noticed you update CreatedBy and CreatedByHistory, I respect keeping track of who makes any edits. I'm not sure it's ideal to include usernames. In both cases I suggest adding a little help doc sentence, I think they happen in prepHeaderMeta(). Sorry for hijacking my own issue. haha

stufield commented 10 months ago

Sorry it's been a while, but closing out this issue. Couple things:

stufield commented 7 months ago

Closing this issue now as this fork contains the conversion from |> -> %>% necessary for backward compatibility to R 3.6.x