OHDSI / CirceR

R package wrapper for CIRCE
https://ohdsi.github.io/CirceR/
6 stars 2 forks source link

destination CRAN? #22

Closed ablack3 closed 6 months ago

ablack3 commented 1 year ago

@chrisknoll, Is there any hope of getting CirceR on CRAN?

tagging @edward-burn

chrisknoll commented 1 year ago

The size is too big (> 5MB). I'd like it to fit into cran, but there was a feature that was thrown into the JSON serialization that calculates the CDM compatability using a javascript library. I tried to resist this feature but it was pushed in, but I'd like to remove it (I haven't done analysis on the impact). Because the semver compatibility logic is a javascript module, we bundle a javascript execution engine with circe (actually with StandardAnalysisUtils, which circe depends on) which turns out to be 47MB (which is 42MB too many MB and that's just a dependency).

ablack3 commented 1 year ago

It would be so nice to have a minified version of Circe available on CRAN since it is a core part of the OHDSI stack. Would it be a bad idea to copy the java source code to CirceR and create a minified version with a smaller feature set - primarily transforming circe json to OHDSI SQL.

I think cran requires that the java source code is in the package.

RowanErasmus commented 1 year ago

@ablack3 @edward-burn

We have managed to reduce the size of circe-be to approx 4.5MB and Chris has recently created a new release, so principally this obstacle should be removed from the 'road to cran' for CirceR.

I'm reviving this issues so you guys can get involved with Chris in whatever is necessary to make this happen :-)

ablack3 commented 1 year ago

Awesome @RowanErasmus!

chrisknoll commented 1 year ago

Ok, I've published a release that has the changes to reduce the package size. I'm not sure where to begin to try to get this into cran (or even if the changes have made the overall pacakage size < 5MB for cran).

If someone wants to take the ball and see about how that could be set up, that would be most helpful.

ablack3 commented 1 year ago

Well Cran packages that contain compiled java code need to also contain the java source code as far as I know. So that is a start. Would you like me to try and submit it to CRAN? Basically it involves

chrisknoll commented 1 year ago

Ok, sounds good, just to see what issues they are going to have. I don't think there's a ton of source code (that's an odd requirement) but we're so close to the 5MB mark with just the dependencies alone, I worry they are going to trip that restriction. But it's worth a shot. Thanks, @ablack3 .

schuemie commented 1 year ago

Be sure to fix these notes before submitting to CRAN:

Found if() conditions comparing class() to string:
File ‘CirceR/R/ConceptSetSqlBuilder.R’: if (class(conceptSetJSON) == "character" && length(conceptSetJSON) == 1 && nchar(conceptSetJSON) > 0) ...
File ‘CirceR/R/PrintFriendly.R’: if (class(conceptSetList[[1]]) == "jobjRef") ...
Use inherits() (or maybe is()) instead.

and add a cran-comments.md file in the root (see here for an example)

I use this for submitting to CRAN. Not sure what the difference with devtools::submit_cran() is:

devtools::check_win_devel()

devtools::check_rhub()

devtools::release()
schuemie commented 1 year ago

Just remembered: please change the system requirements in DESCRIPTION to:

SystemRequirements: Java (>= 8)

(Almost got kicked out of CRAN for that one)

ablack3 commented 1 year ago

Thank you for the heads up @schuemie!

martapineda commented 8 months ago

Hi! Is there any update regarding CirceR on CRAN? Thanks :)

chrisknoll commented 8 months ago

Ok, I'll get on this...I think recent updates to circe has reduced the size (thanks @ablack3 ) and it might be small enough now.

chrisknoll commented 8 months ago

So, I have my doubts about this getting done. I ran through various checks locally and I get this with the internal (RStudio) Check:

* using log directory 'C:/Git/CirceR.Rcheck'
* using R version 4.1.2 (2021-11-01)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* checking for file 'CirceR/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'CirceR' version '1.3.3'
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking whether package 'CirceR' can be installed ... ERROR
Installation failed.
See 'C:/Git/CirceR.Rcheck/00install.out' for details.
* DONE
Status: 1 ERROR

The log file gives no additional details. The build works fine, tho, and installing package from remotes::install_github() works fine.

Why do we want this to go into CRAN, again?

chrisknoll commented 8 months ago

Ah, found additional log files:

* installing *source* package 'CirceR' ...
** using non-staged installation
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Error: package or namespace load failed for 'CirceR':
 .onLoad failed in loadNamespace() for 'CirceR', details:
  call: NULL
  error: .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
Error: loading failed
Execution halted
*** arch - x64
ERROR: loading failed for 'i386'
* removing 'C:/Git/CirceR.Rcheck/CirceR'

So, I don't (and won't) have the JDK for i386, is there a way to indicate that this package isn't for i386? or instead declare it 'multi-arch'? I notice I usually have to pass --no-multi-arch in my install commands like:

remotes::install_github('ohdsi/DatabaseConnector', INSTALL_opts=c("--no-multiarch"))
schuemie commented 8 months ago

In R-Studio, you can set the R check options (Build --> More --> Configure build tools ...), where I believe you can add --no-multiarch to the Check options.

But perhaps you can re-install R without the 32-bit version? R 4.2.3 (the HADES official supported version) does not include the 32-bit version by default.

martapineda commented 8 months ago

@chrisknoll Answering your question from above:

Why do we want this to go into CRAN, again?

We have some collaborators working in a Secure Data Environment who unfortunately cannot install packages from GitHub, only from CRAN, into their environments.

chrisknoll commented 8 months ago

But perhaps you can re-install R without the 32-bit version? R 4.2.3 (the HADES official supported version) does not include the 32-bit version by default.

Now there's a reason for going to that version hahah. Ok I'll go with that.

chrisknoll commented 7 months ago

Ok, #43 is created, waiting for GitActions to show everything passes.

Once that's done, I believe the steps will be to create a new release into master, and then I'd submit to cran from master branch.

martapineda commented 7 months ago

That's fantastic!! :D

chrisknoll commented 7 months ago

Looks like we're all green on the check side. I also addressed an issue raised by git actions that Node 16 is deprecated and we should be using the Node 20 based actions, so you can change your actions/checkout@v2 to actions/checkout@v4 and actions/cache@v2 to actions/cache@v4.

Since I believe these action scripts can be shared by hosting in a repo (example: https://github.com/actions/cache/blob/v3/action.yml contains the script of the cache action for v3), maybe there's something we can do for HADES based actions like RCMD checks so that if we want to update a version of another action, we don't have to go through all of our repos to update the gitaction scripts.

schuemie commented 7 months ago

Great work!

I would love it if we can share the same GitHub Action scripts across HADES. If anyone wants to explore that option it would be greatly appreciated.

chrisknoll commented 7 months ago

@schuemie , it looks like you started researching this (I was going to create an 'actions' repo but it already existed.

If you don't object, I can migrate the RCMDCheck workflow into the shared repo (you started it, but it's 2 years old now), and I think they suggest a specific directory to host workflows (even shared ones)....which is .github/workflows/{workflowfile}.yaml, but honestly, i don't see why we wouldn't host it in a YAML right in the root as you did here.

This is the guide I'd follow to make it reusable: https://resources.github.com/learn/pathways/automation/intermediate/create-reusable-workflows-in-github-actions/

Update: As of 4/3/2024 link is broken, hopefully it's just server issue.

schuemie commented 7 months ago

Yes, I should have mentioned that I started working on that, but then got stuck. If I remember correctly, the problem I didn't know how to solve was how to include the secrets / environmental variables in the shared script.

Please feel free to change everything in that repo.

chrisknoll commented 7 months ago

Just circling back, i'm going to focus first on cran submission and then circle back to make workflows reusable.

chrisknoll commented 6 months ago

This has been accepted in CRAN. Thanks to everyone for the assistance.