OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 9 forks source link

v1.0 release candidate #42

Closed ablack3 closed 7 months ago

ablack3 commented 1 year ago

This release candidate switches the Andromeda backend from SQLite to Apache Arrow. It is not technically a breaking change but due to slight differences in how SQLite tables and arrow tables are handled it can't be released until it is tested with all reverse dependencies.

Addresses

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 78.23% and project coverage change: -6.78% :warning:

Comparison is base (6c679dc) 89.21% compared to head (db6c2db) 82.44%. Report is 4 commits behind head on main.

:exclamation: Current head db6c2db differs from pull request most recent head 279c07d. Consider uploading reports for the commit 279c07d to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #42 +/- ## ========================================== - Coverage 89.21% 82.44% -6.78% ========================================== Files 4 3 -1 Lines 371 262 -109 ========================================== - Hits 331 216 -115 - Misses 40 46 +6 ``` | [Files Changed](https://app.codecov.io/gh/OHDSI/Andromeda/pull/42?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OHDSI) | Coverage Δ | | |---|---|---| | [R/Object.R](https://app.codecov.io/gh/OHDSI/Andromeda/pull/42?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OHDSI#diff-Ui9PYmplY3QuUg==) | `71.92% <69.14%> (-14.28%)` | :arrow_down: | | [R/Operations.R](https://app.codecov.io/gh/OHDSI/Andromeda/pull/42?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OHDSI#diff-Ui9PcGVyYXRpb25zLlI=) | `88.88% <80.00%> (-0.74%)` | :arrow_down: | | [R/LoadingSaving.R](https://app.codecov.io/gh/OHDSI/Andromeda/pull/42?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OHDSI#diff-Ui9Mb2FkaW5nU2F2aW5nLlI=) | `92.53% <91.52%> (+7.14%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ablack3 commented 1 year ago

@schuemie,

I've made the changes we discussed in the call today. nrow, isAndromedaTable, and colnames should now work with both old and new Andromeda tables. Tests have been added.

I also added some code to try and handle the windows file lock issue. Not sure how to test it because it is kind of a random thing and I have no way to know when the lock is released if ever. But maybe it will help. Each time close is called on any Andromeda I try to remove not only the one Andromeda that it is being called on but also any others that are lingering around.

I don't know what to do about the upcoming changes to pull except maybe to complain to the Arrow folks. Otherwise we can just ask users to set an option. We could check the option on startup and print a message.

One thing I wanted to ask about since this is a big change. I'd like to consider moving dplyr from Depends to Imports. The reason is that I'm exporting a lot (all of dplyr), some of which has nothing to do with Andromeda or doesn't work on Andromeda. What if we asked users to load dplyr if they want it? Or possibly just re-export a smaller subset of dplyr functions?

schuemie commented 1 year ago

Would it be possible to do a small release first that introduces the nrow() and isAndromedaTable() first, but doesn't change the backend to arrow? That way, we can have all HADES packages require that new version and use those functions. That should make it a lot easier to make the HADES packages work with both backends.

As for dplyr, I don't feel very strongly about it. I tend to rather aggressively import all of dplyr in my packages anyway, like here, so I would not be affected by Andromeda moving it to Imports. I guess to me, dplyr has become part of the R base, I just always have it loaded.

ablack3 commented 1 year ago

Would it be possible to do a small release first that introduces the nrow() and isAndromedaTable() first, but doesn't change the backend to arrow? That way, we can have all HADES packages require that new version and use those functions. That should make it a lot easier to make the HADES packages work with both backends.

Yes I did create a release with isAndromedaTable. I didn't add nrow yet but I can to that this week in another mini release.

As for dplyr, I don't feel very strongly about it. I tend to rather aggressively import all of dplyr in my packages anyway, like here, so I would not be affected by Andromeda moving it to Imports. I guess to me, dplyr has become part of the R base, I just always have it loaded.

Ok. I think I might try moving it to Imports and only export the dplyr functions supported by Andromeda (select, mutate, etc).