fstpackage / fsttable

An interface to fast on-disk data tables stored with the fst format
GNU Affero General Public License v3.0
27 stars 4 forks source link

Parallel methods in separate package #3

Closed MarcusKlik closed 5 years ago

MarcusKlik commented 6 years ago

For fst to be able to use parallel methods (see #2) during loading of data from a file, they need to have a C++ implementation according to a predefined interface. That interface should be defined in fst. The question is whether it would be best to put these methods in a separate package (e.g. fstmethods).

The advantage would be that a separate package would separate concerns and could concentrate fully on developing new parallel methods. Such a package could experiment with using SIMD operations for example (which I believe isn't done in a CRAN package yet). Parallel operations on vectors are perfect for SIMD.

Also, multiple packages could exist with specific functionality. Linear regression can be done in parallel, and the methods for that could exist in another user defined package or defined on the fly. With this setup, fsttable would be very modular and could be easily extended (without growing huge).

MarcusKlik commented 5 years ago

We can allow for parallel implementations in the tableproxy object.