fml-fam / presentations

0 stars 0 forks source link

Feedback UseR2022 Abstract #1

Open kaneplusplus opened 2 years ago

kaneplusplus commented 2 years ago

I think the description of the library is fine but I think a little more work could be done contextualizing it. In particular:

  1. The Matrix class provides two things. First, it implements a few different matrix representations (dense, sparse, diagonal, etc.) along with providing a high-level interface for those objects. Because it is written in S4 it is a framework for allowing operations to work with different matrix types. The fmlr package is providing new representations but doesn't address the framework issue. Does your higher-level package implement S4 types that are compatible with Matrix?
  2. My basis for comparison is the torch library. Can you say how fmlr distinguishes itself from torch.
  3. I know people like to point out that S4 is slow, but is it usually the bottleneck in matrix calculations? My intuition is that it is for small data sets that run quickly anyway. You already know that for larger matrix operations, dispatch has low relative overhead and we'd probably be better off refactoring the matrix operations or figuring out how to get R to do it.
wrathematics commented 2 years ago

Thanks so much for your response!

  1. I definitely don't articulate this well and knew that I didn't, but was kind of stuck. I want to be very respectful of "prior art", and I think that those are the two biggest packages people are aware of. The reality is, even though fmlr addresses a few similar problems, it's really pretty different in scope. It's really more a competitor to other things I've created, and this is me "getting it right" finally.
  2. I haven't dealt with torch in a while so not sure how it looks these days. But as of my last recollection, they're pretty different. fmlr is a matrix interface for matrix and stats operations rather than a more general tensor library, and it includes things like MPI operations via ScaLAPACK. It's probably worth fleshing out this comparison better.
  3. So if the matrix computations are large then yes, the overhead is minimal. But if you have a lot of small matrix computations, then you can't really use R because of memory and S4 overheads. fmlr is harder to use (it's more like directly programming in C++), but it side-steps some of these bottlenecks, like S4 and not allowing you to re-use memory.