atlas-engineer / nfiles

User configuration and data file management
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Binary format for data files: FASLs! #5

Open aartaka opened 2 years ago

aartaka commented 2 years ago

Our data files take time to load. And, even though we often load them asynchronously or optimize their reading in other ways, there's only so much we can speed up without changing the format. And, while SQLite or some other opaque format may be nice and performant, impelementation-native FASLs may be even faster and need no foreign interfaces. The only challenge being: how do we put data into FASL and read it back, portably?

Solution

One way I'm thinking about is

Alternative solutions

I'm pretty sure there are other ways, like using compiled function bodies or eval-ing the data fetched from FASL, and I'm conscious that my approach is not the most performant, but it skips at least some some roadbumps.

The perfect approach would be to compile the Lisp data itself (and not its cl-prevalence-serialized representation) into FASLs.

Additional context

I've done some of this compilation magic in Sade, and here is the relevant bit:

(let* ((in (uiop:merge-pathnames* (uiop:parse-native-namestring (second args))
                                               (uiop:getcwd)))
                    (out (uiop:merge-pathnames*
                          (or (uiop:parse-native-namestring (third args)) (pathname-name in))
                          (uiop:getcwd))))
               #+ecl
               (uiop:with-temporary-file
                   (:stream f :pathname p :type "lisp" :keep t)
                 (print (with-open-file (i in) (bf i)) f)
                 (print '(si:quit) f)
                 :close-stream
                 (compile-file p :system-p t)
                 (c:build-program
                  out :lisp-files (list (uiop:merge-pathnames*
                                         (concatenate 'string (pathname-name p) ".o") p))))
               #-ecl
               (let ((tmpname (gensym "TMP")))
                 (bf-compile-from-file tmpname in)
                 (setf uiop:*image-entry-point* (lambda () (funcall tmpname)))
                 (uiop:dump-image out :executable t)))

Those, however, are concerned with making executable files, and not FASLs, but they still set the tone for how we might approach the problem.

Ambrevar commented 2 years ago

Yes, serializing to .fasls makes total sense! Note that it would never be portable though. But that's OK, since in practice this would just be a "fast cache" and you'd fallback on the original file if the fasl is not available for your current implementation.

It's also not so clear where to store the fasl. I suppose in the usual ~/.cache/common-lisp folder? Is there a function to expand the cache path? ASDF can do this I think, need to find the right API point.

So to implement this, why we need to do is simply to extend the lisp-file methods. We could add a slot to the class to let the user choose the cache path for instance.

  • and, when we need our data, loading it to get *data* variable set to the right value. Which is probably faster and more overflow-resistant than read-ing the whole file.

I didn't understand this. Can you give an example?

aartaka commented 2 years ago

Cool!

  • and, when we need our data, loading it to get *data* variable set to the right value. Which is probably faster and more overflow-resistant than read-ing the whole file.

I didn't understand this. Can you give an example?

I mean that, when we load a file, we can't get its contents directly through load. We need to either call some contained function or access some variable that's defined in this file. So my suggestion here is to use some magic variable that's being set to a new value after every loaded FASL. But still, that's not nice and if you have an idea for how to avoid all the function/variable hacks here, I'll be glad to know :)

Ambrevar commented 2 years ago

Oh, I see.

Same here, I'd like to know...

aartaka commented 1 year ago

Here's a progress and a complete-ish prototype (in comments) for loading data from FASLs: https://www.reddit.com/r/Common_Lisp/comments/12dxdic/dumping_objects_into_compiled_files/

Ambrevar commented 1 year ago

I'm not sure I get the macro trick... What's special about it beside binding to a global variable?

aartaka commented 1 year ago

I'm not sure I get the macro trick... What's special about it beside binding to a global variable?

So the logic is:

Ambrevar commented 1 year ago

OK, got the macro trick now, it's super smart!

Ambrevar commented 1 year ago

All that said, does this really belong to Nfiles? I believe Nfiles should leave the user with the option to choose their prefered serialization format.

If we don't want to create a dedicated library just for this, I suggest to create a dedicated package at least, to decouple regular file management from specialized serialization.

Ambrevar commented 1 year ago

@aartaka Wanna work on it?

Ambrevar commented 1 year ago

We would also need some benchmarks to compare cl-prevalence with this approach.

aartaka commented 1 year ago

@aartaka Wanna work on it?

Yes, but not necessarily soon enough :)