Open dselivanov opened 7 years ago
Not a solution to the original repo size, but some think using git-lfs is a good option. It reduces the size of clones of the repo by only downloading the latest version of the current branch, not all versions. It does require all devs on the project to install it as well.
Is it possible to separate the jar into multiple jars? The method that rkafka
uses is to have one R package with rarely-changing java dependencies (rkafkajars
) which holds the majority of file size, and the main package only needs to contain the one or two .class
files that change frequently.
@r2evans thanks for suggestion. I thought about to not compile it into fat jar. Will try.
It would be nice if, when building your R package, it could dynamically create the jar file instead of making you do it behind the scenes. I don't know how that can be done easily (without adding external-to-R dependencies and custom packaging).
If you use a second package (e.g., rscalaKafkaJars
), you could use:
.onLoad <- function(libname, pkgname) {
jars <- list.files(system.file("java", package = "rscalaKafkaJars"),
pattern = ".*\\.jar$", full.names = TRUE, recursive = TRUE)
rscala::.rscalaPackage(pkgname, classpath.appendix = jars)
}
and that should provide the same functionality. That way, though the repo with the fat-jar would still be rather large, the rscalaKafka
repo would not grow.
Caveat: short of "starting over with the repo", I think you'll always have a large repo (it will not shrink). One question/answer on StackOverflow has suggestions for removing large files from a commit(s), including linking to GitHub's "Removing sensitive data from a repository" and another StackOverflow q/a.
Repo is already > 50M due to tracking of fat jar history