Bioconductor / BiocCheck

http://bioconductor.org/packages/BiocCheck
8 stars 26 forks source link

Check of individual file sizes not accurate? #167

Closed lshep closed 1 year ago

lshep commented 2 years ago

@LiNk-NY We are moderating a package HDO.db. In the extdata folder there is an sqllite file that is 6.1 M

shepherd@jbcj433:~/PkgReviews/HDO.db/inst/extdata(main)$ ls -lrth
total 6.1M
-rw-rw-r-- 1 shepherd shepherd 1.9K Aug 12 11:22 parse-obo.R
-rw-rw-r-- 1 shepherd shepherd 6.1M Aug 29 08:21 HDO.sqlite
-rw-rw-r-- 1 shepherd shepherd 5.1K Aug 29 08:21 get_sqlite.r

BiocCheck does not give the WARNING of files over 5M in size

─ BiocCheck results ──
0 ERRORS | 0 WARNINGS | 3 NOTES

See the HDO.db.BiocCheck folder and run
    browseVignettes(package = 'BiocCheck')

I cannot ingest the package because of git's limitation on file size which is why the individual check is so important and the ingestion script will not let me proceed

> .precheck_filesize("/home/shepherd/PkgReviews/HDO.db")
Error in .precheck_filesize("/home/shepherd/PkgReviews/HDO.db") : 
  files larger than 5Mb:
  /inst/extdata/HDO.sqlite

@vjcitn follow up to what I was saying with HDO.db -- files of larger size we normally suggest as hub packages -- granted this is an annotation package so we can make and exception and treat as a traditional annotation package but then it will not be in git.bioconductor.org and just uploaded manually as a tar.gz once it passes review

LiNk-NY commented 2 years ago

Hi Lori, @lshep I don't think it was written to give a warning for annotation packages. Do you want this to change?

https://github.com/Bioconductor/BiocCheck/blob/9928a7c7ff90f9925b31bcd481d66064874060c4/R/checks.R#L232-L249

As explained in the vignette:

https://github.com/Bioconductor/BiocCheck/blob/9928a7c7ff90f9925b31bcd481d66064874060c4/vignettes/BiocCheck.Rmd#L169-L173

lshep commented 2 years ago

Ah that makes sense. I think it should be changed. We can't for git and we want large files to be hub hosted or server based at least that is the way we've been headed. I think the only software was legacy from when we had traditional experiment and annotation packages.
But am open to a second opinion on this @vjcitn