Closed jmw86069 closed 7 months ago
Hi James,
Thanks for your question and for using SomaDataIO
. Please excuse the late response but I've been out of town.
To your question about backward compatibility to older versions of R v3.6.*
. Unfortunately there are three problems with maintaining backward compatibility to older versions of R
:
SomaDataIO
package itself is composed of a subset of SomaLogic's internal analysis code base, which is periodically fixed to the current R version at approx. 6 month intervals. Breaking out SomaDataIO
from this ecosystem would require maintaining 2 (almost) identical code bases, which is undesirable.SomaDataIO
(perhaps under a different name) to CRAN
, which requires all checks and unit tests to pass under the current released version of R
.magrittr
pipe (%>%
) in favor of the native R pipe (|>
) was made to reduce external dependencies and integrate better with the base R
ecosystem. So reversing that decision is also undesirable.So unfortunately I do not foresee a backward compatible version of SomaDataIO
at this time. However, your suggestion might provide a alternate solution. Since you've already forked the repository, perhaps creating a branch with the necessary pipe adjustments and DESCRIPTION
file updates would allow others in your position to move forward with their analyses. This would be a "dead end" branch that would never be merged, and the primary drawback would be it would no longer be actively developed/debugged, but it would be frozen in time for R v3.6.*
. We could also tag it appropriately to facilitate automated pipelines. If this alternative might work for you, I'd have to create a branch for you to submit a PR against. Thoughts?
I respect your reasoning.
My main goal is to submit data to GEO. Our ADAT files contain a number of sample annotation fields not appropriate for GEO, so I need to remove some annotations, and add others that are helpful for downstream analysis. Thus far, this is the only package that can export ADAT files.
I understand the point about support and maintenance, those are driving reasons. It would add a layer of support burden to test R-3.6. and R-4., I respect that. I don't think allowing R-3.6.1 compliance would break any R-4.* tests, although you would know better than I.
I agree with the goal of reducing R dependencies, I do that too for what it's worth. However SomaDataIO
widely uses dplyr
, which itself still imports magrittr
. I didn't see other R-4.* specific syntax, such as dynamic functions for example: \(x)
.
My main counterpoint is that in principle, for me anyway, a data import package shouldn't require the most recent version of R. Data access should be as broadly available as possible. This is not a strong opinion, I could be missing something, there are exceptions and all that. :) Maybe this package should really only be for R-4.*, that's okay too.
(But it almost already works in R-3.6.1!)
Anyway, if that was also not persuasive, that's okay too. I appreciate your suggestion. If I understand right, essentially make a PR with the changes I made for R-3.6.1 compatibility, into a branch specific for R-3.6.1? For a fixed file format (for now anyway) this release might have some longevity. And certainly better to use a package under your Github than my own.
I think, at least philosophically, we are in agreement. Vis-a-vis reduced dependencies, principle of an IO package not requiring specific software version, etc. The tricky part is how to achieve it in the context of two separate code bases with minimal maintenance overhead.
One idea I've considered in the past is re-writing much of the code to be version agnostic (both internally and externally), e.g. simply avoid using pipes altogether! I think the advantage of piping is more visual at the user level, but for internal code that the user never sees, may be an unnecessary complexity.
But to be honest I haven't looked too deeply into it to determine what kind of investment it would take to get us there. To that end, yes you understand me correctly, we could at least start the process (no promises!) with a PR that removes pipes altogether and becomes R version agnostic but is still written in base R (no %>%
either). This would:
main
Does that sound like a reasonable path forward?
I hear you, it's a similar thought process for a lot of R packages - whether to depend upon another package, or rewrite code to less convenient syntax that avoids the dependency. I think dplyr/magrittr are fairly widely used, so it doesn't feel extravagant to require those packages.
By comparison, some packages feel extravagant, requiring 20+ packages that aren't related to the core functions I want to import. I use SummarizedExperiment
as my core expression data type, and it legit takes 5-10 seconds just to require the package, sometimes loading the package is the slow step in an Rmarkdown. It's pretty heavy, but useful. I digress. Haha.
I don't think I'm in position to rewrite pipes %>%
or |>
to non-pipe alternatives, if I'm understanding your suggestion. I was able to convert to %>%
and confirm it still works the way I think you intended.
I think the question is whether your plan still includes using dplyr
functions. If yes, then there isn't much benefit to removing %>%
pipes, since you still depend on magrittr. If you also want to remove the dplyr dependency, that could work but is probably beyond my ability to help.
Aside: I just chased down some differences in Adat files, discovered you change ","
to ";"
in the gene annotation data. :) Haha. I noticed you update CreatedBy
and CreatedByHistory
, I respect keeping track of who makes any edits. I'm not sure it's ideal to include usernames. In both cases I suggest adding a little help doc sentence, I think they happen in prepHeaderMeta()
. Sorry for hijacking my own issue. haha
Sorry it's been a while, but closing out this issue. Couple things:
prepHeaderMeta()
no longer zaps commas and replaces with ";"
. It simply writes what is initially, which should result in a higher fidelity to the original ADAT. However, this may result in some funky output when writing to *.csv
(as some users do), which is why this was originally introduced.
CreatedBy
and CreatedByHistory
entries were greatly simplified to (attempt to) avoid confusion. However, I ultimately decided to keep the user included on output for traceability.
Checks
soma_adat
manipulationProblem
|>
), so it seemed relatively straightforward. I could be missing something you have in plans, due respect. My alternative is to create a fork to do the same.Feature
|>
can be replaced with%>%
, which already imported via the dependency on the dplyr package. I'm not aware of another aspect of R code that requires R v4 or higher - I could be missing something however.R code
I don't have R code, but did clone the repository, global find/replace to use
%>%
, andread_adat()
worked properly.I admit it's in category of "nice to have". :) Thanks for the nice package dev, I'm sure to be using it quite a bit.
Priority Level
Thanks for contributing :partying_face:!