cretz / software-ideas

88 stars 0 forks source link

Web Browsing Activity Visualization #106

Open cretz opened 4 years ago

cretz commented 4 years ago
jdewald commented 4 years ago

Some basic POC steps I got working. Annoyingly, getting gource to build directly on my Mac was a pain so I just used a docker version.

Export Chrome history to TSV

Output format: UNIXTIMESTAMP\tURL\tSOURCE_URL (source_url unused right now, but could be nice later)

CHROMEHISTORY=/path/to/historyfile
cat <<'EOF' | sqlite3 $CHROMEHISTORY > history.tsv
.mode tabs
select (v.visit_time/1000000)+strftime("%s",'1601-01-01'),u.url,vfu.url 
from visits v inner join urls u on u.id=v.url 
left join visits vf on vf.id=v.from_visit 
left join urls vfu on vfu.id=vf.url order by v.id;
.quit
EOF

Convert to gource custom log format

The above can be fairly directly converted to the custom log format (you could in fact have sqlite directly output, but I wanted to track notion of "first visit" vs "new visit"

cat <<'EOF' > history2gource.awk
# Converts a .tsv of the format TIMESTAMP\tURL\tSOURCE_URL to custom log format of Gource:
# The "username" will be the domain
# TIMESTAMP|USERNAME|TYPE|/HOST/FILE|COLOUR
#
# The type will be:
# A - first time we've seen this  (host,path)
# M - later times
BEGIN {
   OFS="|"
   user="me"
}
{
   match($2,/\/\/([^/]+)(\/[^?#]+)?/,tokes)
   host=tokes[1]
   path=tokes[2]
   combined=host path
   type="M"
   if (combined not in visited) {
      type="A"
      visited[combined]=1
   }

   print $1,user,type,"/" host path,""
}
EOF

You can then do gawk -f history2gource.awk history.tsv > history.log or simply pipe the above to it

Generate the gource visualization

(ffmpeg command line blatantly stolen from Gource site)

gource -1280x720 --log-format custom history.log -o - | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -pix_fmt yuv420p -crf 1 -threads 0 -bf 0 chromehistory.mp4

NOTE: If you use "-" as the input file name, then you can have a full pipe all the way through. In my case, if I don't directly stream the ppm data, it ends up being multiple gigabytes.

Some observations:

Frame of output: Screenshot 2020-06-27 16 33 33

cretz commented 4 years ago

@jdewald - Nice! Yeah, gource is just the first thing that came to mind (used it in the old days), there might be other visualizations that are better to show history, especially time related ones.