babashka / bbin

Install any Babashka script or project with one command
MIT License
139 stars 9 forks source link

`bbin ls` is unnecessarily slow #62

Closed rads closed 1 year ago

rads commented 1 year ago

I haven't done any profiling yet but it's likely slow because we're reading and parsing all the scripts in bbin bin every time to generate the list. This could be improved with https://github.com/babashka/bbin/issues/34 but there may be easier fixes outside of that too.

teodorlu commented 1 year ago

Perhaps reading the first 1 MB for each script is a simpler solution?

bbin ls is quite slow on my system too -- right now, it takes 10 seconds to run. I've been assuming that it reads all the binaries on my system to find bbin binaries. Some of those are quite large; so I was optimistically hoping we could cut 99 % of that time by exiting faster on the huge binaries.

borkdude commented 1 year ago

Perhaps it's better to save data as edn in a predictable location so you don't actually have to search anything, but the edn describes where scripts live?

teodorlu commented 1 year ago

An EDN file could possibly get out of sync with the index, for example if a user moves or deletes a bbin-installed script.

What's the right thing to do then? Assume it won't happen, and crash? Silently delete references from the index to scripts that don't exist, and stop tracking them? Support some kind of "reindexing" logic that derives the index from the user's system?

Keeping metadata and scripts apart has advantages, as outlined by @borkdude in https://github.com/babashka/bbin/issues/34. I do like the idea of being able to track an EDN file in Git, run a command to have all the scripts available.

rads commented 1 year ago

For now I'm going to start with reading the first 5kb of each file, which is an easy change to make. On my system, this improves the time for bbin ls from about 4 seconds to about 1/10th of a second. I'll go ahead and release this after I have some tests written.

https://github.com/babashka/bbin/tree/issue-62

borkdude commented 1 year ago

An EDN file could possibly get out of sync with the index, for example if a user moves or deletes a bbin-installed script.

I think this is unusual behavior. A user should just not tamper with those scripts but should use bbin to manage those. This argument could be used for any package manager: moving programs that were installed with brew screws the index of that package manager or moving dependencies that were added by maven screws with clj.

borkdude commented 1 year ago

clojure -Ttools install creates .edn files per installed script in ~/.clojure/tools. I think that could be a good approach for bbin too.

teodorlu commented 1 year ago

For now I'm going to start with reading the first 5kb of each file, which is an easy change to make

For my use use, this seems like a free performance improvement, which I like!

EDIT: The following comment has been moved to https://github.com/babashka/bbin/issues/34#issuecomment-1752849723.

After some hammock time, I think @borkdude's proposal of having one EDN file per installed script and an operation in bbin to install a script for each EDN file is a way to go.

Why: because implementation complexity and User can track EDN files in Git, and have bbin ensure the scripts are installed on all computers are important, and bbin uninstall safety/atomicity can still be solved in the future.


I tried to map this out with a Rich Hickey style decision matrix (as advocated for in Design in Practice). Screenshot:

image

Full document: https://docs.google.com/spreadsheets/d/1AWiYHYteuTtTGCaHWg0DNuX2rTXXJTzZzl0pbOzFsWU/edit#gid=0

(send me a Google account if you want write access, eg to add a new column for a different approach, or to add a new row for a different criterium).

I think "metadata by script install name" is the best place to start, then consider "metadata in transaction log" or "metadata by script hexdigest" later if safe/atomic uninstall is desired.


@rads I don't want to step on your toes here! I feel like I perhaps went a little overboard with design, I've been wanting to try out some of the things Rich proposed in his latest talk for a while. You know the bbin implementation in more detail than I do, and I might be missing something. Please correct me if that's the case!

rads commented 1 year ago

I'm enjoying the discussion about the script metadata, but I think we should continue in https://github.com/babashka/bbin/issues/34 since my short-term fix will address the speed issue with bbin ls.

rads commented 1 year ago

P.S. @teodorlu, you aren't stepping on my toes. 😄 I appreciate that you've spent time thinking about this!

borkdude commented 1 year ago

Agree with @rads, @teodorlu amazing table :) Perhaps post it at the other issue then?

teodorlu commented 1 year ago

[...] but I think we should continue in #34 since my short-term fix will address the speed issue with bbin ls.

[...] Perhaps post it at the other issue then?

Agreed. Moved to https://github.com/babashka/bbin/issues/34#issuecomment-1752849723.