bschilder / ThreeWayTest

Summary statistics-based association test for identifying the pleiotropic effects with set of genetic variants
2 stars 1 forks source link

Optimise data compression #15

Closed bschilder closed 1 year ago

bschilder commented 1 year ago

https://github.com/bschilder/ThreeWayTest/actions/runs/4313767916/jobs/7526122915#step:4:5962

❯ checking LazyData ... WARNING
    LazyData DB of 7.8 MB without LazyDataCompression set
    See §1.1.6 of 'Writing R Extensions'

❯ checking data for ASCII and uncompressed saves ... WARNING

    Note: significantly better compression could be obtained
          by using R CMD build --resave-data
                               old_size new_size compress
    covariance_matrix_data.rda    1.3Mb    896Kb       xz
    data_matrix_final.rda         6.2Mb    3.9Mb       xz
    gene_length_list.rda           16Kb     11Kb       xz
    gene_list.rda                  73Kb     54Kb    bzip2
    selected_genotype.rda          14Kb      5Kb       xz

The latter warning is fixed by running the following (solution found here):

f=list.files('data', full.names = T)
tools::resaveRdaFiles(f)
bschilder commented 1 year ago

https://github.com/bschilder/ThreeWayTest/actions/runs/4313767916/jobs/7526122915#step:4:5962

❯ checking LazyData ... WARNING
    LazyData DB of 7.8 MB without LazyDataCompression set
    See §1.1.6 of 'Writing R Extensions'

This is resolve by setting the LazyDataCompression field in the DESCRIPTION file. Solution found here.

I chose gzip, though it's possible other formats are more efficient:

LazyDataCompression: gzip

With this, ThreeWayTest now passses checks with only a note about the size (which is ok for now): Screenshot 2023-03-02 at 13 16 33