Track shell history in a database

brandondrew commented 11 years ago

Every piece of info tracked as part of history is basically another field, and the more flexibility offered to the user (in terms of what to include and what to exclude from history) the more complicated it gets to the developer of the shell, leaving them to basically build database functions from scratch, when they would more easily just use a database (probably SQLite).

Here are some of the fields that might be tracked as part of history:

command
start_time
stop_time
working_directory
date
command_id (integer starting at 1 and incrementing for each command: standard in most shells)

The killer feature to this, more than ease of development, would be the ability for the end user to define their own fields. For instance, if the working_directory field were defined by an end user, they would just have to specify the name (whatever name they wanted) and the command or value to run or evaluate on each command execution. In this case pwd or $PWD would both do nicely. Allowing the user to add arbitrary data to their command history would make this the most flexible and powerful command history EVAR. Or at least I'd bet $5 on that.

Spakman commented 11 years ago

I had been planning on starting work on the history builtin (including issue #8) this week, but there are some good thoughts here (and issue #9) so I think it's worth leaving it to settle for a little while for us to get a better idea of exactly what would be useful.

I'm not too keen on introducing a hard dependency of a database on the shell (this wouldn't preclude you using a database for storage - ideally Urchin will be easy enough for a Ruby programmer to modify to change this sort of thing).

With that said, I'm really sold on the idea of allowing arbitrary data to be stored - I can see a lot of benefits. Kind of thinking aloud - perhaps a hook along the lines of:

before_history_append do |history_line|
  history_line.working_directory = Dir.getwd
  history_line.ruby_version = Shell.new.eval("which ruby")
  history_line
end

could be user specified to return a dynamic HistoryLine object. There could also be a per-directory hook to do the same thing - for certain projects you may want to additionally store the git commit hash, for example.

A little trickier is coming up with a nice interface for the history builtin itself.

brandondrew commented 11 years ago

Yeah, I totally agree about the aversion to requiring a database. Perhaps for users who don't want to store arbitrary data it could just use a normal history file.

I like the example, although the word "append" makes me think of how some shells (depending on configuration) append a whole session of in-memory history to the history file at once. I'm assuming that is meant to be fired after every command, not just when the in-memory history is appended to the file, so how about before_writing_history? (I'm thinking that history will be written--whatever precisely that means--after every command. Maybe before_recording_history is better?)

This is a bit of a tangent, but one of the things I'd REALLY like to do with my history (and one reason I like the idea of putting it in a database) is that I'd like to be able to see the commands ordered by time, BUT ALSO be able to see only the commands I issued in a certain terminal window. As far as I know there is no shell that allows you to have it both ways: I'm pretty sure you have to choose one or the other, and can't slice and dice after the fact. I usually set the title of any given iTerm window to a statement of purpose for that window. This way I know whether I can close the window or whether I'm in the middle of something when I come back to it. The purpose/title is one of the things I'd like to store in the database, so I can see just the commands that were a part of one terminal session.

On Tue, Oct 2, 2012 at 5:10 PM, Mark Somerville notifications@github.comwrote:

I had been planning on starting work on the history builtin (including issue #8 https://github.com/Spakman/urchin/issues/8) this week, but there are some good thoughts here (and issue #9https://github.com/Spakman/urchin/issues/9) so I think it's worth leaving it to settle for a little while for us to get a better idea of exactly what would be useful.

I'm not too keen on introducing a hard dependency of a database on the shell (this wouldn't preclude you using a database for storage - ideally Urchin will be easy enough for a Ruby programmer to modify to change this sort of thing).

With that said, I'm really sold on the idea of allowing arbitrary data to be stored - I can see a lot of benefits. Kind of thinking aloud - perhaps a hook along the lines of:

before_history_append do |history_line| history_line.working_directory = Dir.getwd history_line.ruby_version = Shell.new.eval("which ruby") history_line end

could be user specified to return a dynamic HistoryLine object. There could also be a per-directory hook to do the same thing - for certain projects you may want to additionally store the git commit hash, for example.

A little trickier is coming up with a nice interface for the historybuiltin itself.

— Reply to this email directly or view it on GitHubhttps://github.com/Spakman/urchin/issues/14#issuecomment-9087103.

Brandon Zylstra brandon.zylstra@gmail.com

Spakman commented 11 years ago

@brandondrew Thanks for the useful real-world example. Hopefully what you describe should be pretty easy to achieve with a combination of the history builtin and friends like grep and sort (Unix philosophy, etc).

My current thinking is that default Urchin will store the entered command and the timestamp (haven't ruled out storing other data by default yet). On top of that, users can add arbitrary fields as discussed. So...

Given:

# Or whatever we end up calling the hook method (I'm quite liking before_recording_history or something similar).
before_history_append do |history_line|
  history_line.working_directory = Dir.getwd
  history_line.ruby_version = Shell.new.eval("which ruby")
  history_line.terminal = Process.pid
  history_line
end

The history builtin would behave like this:

# Display the default history listing (perhaps this won't even include the date column).
$ history
05/10/2012 15:07:01  ruby -Ilib examples/example_readline.rb 
05/10/2012 15:07:10  fg
05/10/2012 15:07:21  git reflog
05/10/2012 15:07:29  git co d9550de
05/10/2012 15:07:37  git stash
05/10/2012 15:08:41  git reset --hard ORIG_HEAD

# List the fields available. Some fields may only be on some lines - this is OK.
$ history -f
date,command,working_directory,ruby_version,terminal

# Display the working_directory and command columns in that order:
$ history -o working_directory,command
/home/mark/src/rb-readline/  ruby -Ilib examples/example_readline.rb 
/home/mark/src/rb-readline/  fg
/home/mark/src/rb-readline/  git reflog
/home/mark/src/rb-readline/  git co d9550de
/home/mark/src/rb-readline/  git stash
/home/mark/src/rb-readline/  git reset --hard ORIG_HEAD

This stops the builtin getting too complicated and means Urchin doesn't fall into the trap you mention above where the shell is basically emulating SQL queries. As far as I can tell, this should be flexible enough to be really useful and simple enough to actually implement!

Any thoughts on that interface?

brandondrew commented 11 years ago

I like the interface. I'd also like to also have longer options, e.g. history --fields and history --output=working_directory,command.

But I'd hate to be the one to implement it if the history is stored in a file, and different fields are represented by different columns of text. I think I'd go crazy.

Here's a (mildly) crazy idea: rather than use columns with spaces, like most shells do for history, why not use YAML? If you want a flexible way to store arbitrary data, and you don't want to use a normal database (not even SQLite) then that seems to me like the only sane choice (at least that I can think of). XML might give as much flexibility but would be much more verbose and IMO ugly. I suppose JSON is an option but I don't see any advantages to it in this context. Or just Ruby hashes, but that seems like a security risk (the only advantage I can see with Ruby hashes would be that it's executable out of the box with no need for parsing, but that could be an attack vector potentially, too).

I'm trying to think of downsides, and the only possible one I can think of is performance, but I have no idea if that's an issue at all. In fact, I don't think it even could be, since most interaction with the history.yaml file would be appending to it (after every command, I hope) but the last 1000(?) commands could also be held in memory if necessary. I think any performance issues that come up could be gotten around somehow. Even if the history file is ten million lines long (I might be guilty of letting my history grow to that length over many years) there could be a smaller file in /tmp for most history requests.

On Fri, Oct 5, 2012 at 11:26 AM, Mark Somerville notifications@github.comwrote:

Display the working_directory and command columns in that order:

Brandon Zylstra brandon.zylstra@gmail.com

Spakman commented 11 years ago

Good shout with the longer options - I like that too.

I've not really decided about the storage of the file. I have been thinking about using PStore (although the binary format worries me a litte, it might work out). I'll be giving that stuff a lot of thought over the weekend.

Performance is definitely a consideration - the file is, currently, parsed and read on startup, as you'd expect. I agree though that any performance issue should be solvable. I might try a few benchmarks this weekend if I get a chance. It will probably depend on the weather :)!

I'll keep this ticket up-to-date.

barabo commented 11 years ago

FWIW - I wrote a history tracking extension for bash / zsh - you may find it interesting what I'm collecting and how I query it: https://code.google.com/p/advanced-shell-history/

Spakman commented 11 years ago

@barabo - very cool, thanks a lot!

One thing I have found since writing my own advanced shell history stuff in Urchin is that I don't use it all that often, if ever. I'm a bit surprised, since I know that it's useful.

Do you use it a lot? What are the common queries you run? Is my lack of use simply down to me forgetting that it's there (much like learning a new command in Vim)?

brandondrew commented 11 years ago

@barabo: awesome. Thanks.

barabo commented 11 years ago

Typically I use the 'show me history for the CWD' query on the command line. I also frequently open the db in sqlite3 to look for command durations or previous invocations of some command or other.

Example: -- look at the details for all the successful apt-get installs on a system. select * from commands where command like '%apt-get install %' and rval == 0;

In theory, one could collect history from any number of users and hosts and combine them into the same db, so it's possible to keep a large shared history database.

Spakman / urchin

Track shell history in a database #14

Display the working_directory and command columns in that order: