For fst to be able to use parallel methods (see #2) during loading of data from a file, they need to have a C++ implementation according to a predefined interface. That interface should be defined in fst. The question is whether it would be best to put these methods in a separate package (e.g. fstmethods).
The advantage would be that a separate package would separate concerns and could concentrate fully on developing new parallel methods. Such a package could experiment with using SIMD operations for example (which I believe isn't done in a CRAN package yet). Parallel operations on vectors are perfect for SIMD.
Also, multiple packages could exist with specific functionality. Linear regression can be done in parallel, and the methods for that could exist in another user defined package or defined on the fly. With this setup, fsttable would be very modular and could be easily extended (without growing huge).
For
fst
to be able to use parallel methods (see #2) during loading of data from a file, they need to have aC++
implementation according to a predefined interface. That interface should be defined infst
. The question is whether it would be best to put these methods in a separate package (e.g.fstmethods
).The advantage would be that a separate package would separate concerns and could concentrate fully on developing new parallel methods. Such a package could experiment with using SIMD operations for example (which I believe isn't done in a CRAN package yet). Parallel operations on vectors are perfect for SIMD.
Also, multiple packages could exist with specific functionality. Linear regression can be done in parallel, and the methods for that could exist in another user defined package or defined on the fly. With this setup,
fsttable
would be very modular and could be easily extended (without growing huge).