fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
614 stars 42 forks source link

Big-endian seems to work: maybe remove misleading requirement on CRAN? #275

Open barracuda156 opened 1 year ago

barracuda156 commented 1 year ago

CRAN lists little-endian as a requirement. Why is it so? What may be needed to add big-endian support?

P. S. fstlib claims that it can be compiled on all major platforms, and zstd and lz4 certainly build and work fine on Big-endian platforms. @MarcusKlik Could you please comment on this?

barracuda156 commented 1 year ago

Hmm, it seems to work fine on ppc:


R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

> 
> # required packages
> library(testthat)
> library(fstcore)
> library(lintr)
> 
> # run tests
> test_check("fstcore")
[ FAIL 0 | WARN 0 | SKIP 2 | PASS 11 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• On CRAN (2)

[ FAIL 0 | WARN 0 | SKIP 2 | PASS 11 ]
> 
> proc.time()
   user  system elapsed 
  4.561   0.287   4.878 
barracuda156 commented 1 year ago

Everything works:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

> 
> # required packages
> library(testthat)
> library(fst)
> 
> # run tests
> test_check("fst")
[ FAIL 0 | WARN 0 | SKIP 1 | PASS 823 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• On CRAN (1)

[ FAIL 0 | WARN 0 | SKIP 1 | PASS 823 ]
> 
> proc.time()
   user  system elapsed 
 99.967   1.810 101.811 
MarcusKlik commented 7 months ago

Hi @barracuda156, you're absolutely right, Big-endian reads and writes work correctly if done on the same system.

The problem is when you transport the resulting fst files from a Big-endian system to a Little-endian system or visa versa. In that case, integer reads can mixed up for example because the byte-orders are reversed. This problem can be solved with an adjustment to the reading algorithm for those cases if there would be a need for that.

So the LZ4 and ZSTD compression used by fst is already Endian-safe (see the block format description) but the read from the decompressed integers to memory is not...