aardappel / procrastitracker

a Windows time tracking application
http://strlen.com/procrastitracker/
498 stars 53 forks source link

Reading the database manually #19

Open oguzhanogreden opened 7 years ago

oguzhanogreden commented 7 years ago

I wanted to process the data myself, to visualize some of my keystroke activity for fun. However I could not find any guide or hint as to how to read the file. There is this but I am not familiar with Java.

Would it be feasible to provide geeks among users with a guide or some hints?

aardappel commented 7 years ago

Procrastitracker actually comes with this file that explains the format: https://github.com/aardappel/procrastitracker/blob/master/PT/file_format.txt What language are you using?

oguzhanogreden commented 7 years ago

Oops I've missed this! Thanks.

I will use R.

jt-fuw commented 7 years ago

Here is simple program which dumps contents of db.PT file: pt-dmp.zip

koyto7 commented 7 years ago

is there a way to extract data to .CSV or any excel readable format for analysis purposes?

jt-fuw commented 7 years ago

Get the pt-dmp.zip and try it (or look at an included example); it gives output which structure reflects (by indents and line order) a hierarchy which is in the db.PT information. Do you have an idea how to express it in the .CSV? Generally, .CSV is a flat-organized, but of course one can define a way to express a hierarchy in it; the question is what method would be best for you. pt-dmp.txt

The first line tells file format version (I have versions of procrastitracker giving two different formats of the db.PT, version 9 and 10, the program decodes both) and tag count; then tag definition lines follow (each starts from tag index 0..9, then specified color and name); then a line containing parameters stored in the db.PT; all remaining has a tree structure.

The tree structure has two kinds of entries: node and day; a node may have many days assigned to it, a day is shown in two lines and has no sub-elements in the hierarchy; you can think about days like leaves on a tree (or files in a filesystem), and about nodes like branches on a tree (or directories in a filesystem), although seems semantically more important is binding a day to a node than connections between nodes; a node has a tag index, which refers a tag.

Such a structure is usually stored in database as four tables:

  1. Info about parameters - 1 record, or separate record for each parameter.
  2. Tag table (key=tagindex, value=tagname).
  3. Node table; a node contain link to its parent (e.g. a parent identifier), to build a tree structure (key=nodeindex; data contains parentindex, nodename, tagindex, ishidden).
  4. Day table (key=nodeindex+date; data contains 7 fields). And the .CSV format is usually for a single table.
jt-fuw commented 7 years ago

Hm... can the Excel read ODS/FODS (OpenDocument Spreadsheet) file? It can be used for multi-table database. OpenOffice reads it and it can write Excel format.

Few years ago I needed huge amount of database data to be converted to Excel format; I designed by own file format which I called TMT (tabbed multi-table) and wrote a converter of the TMT to FODS; I first converted the data to TMT (it could be done using standard commands), then converted it to FODS, then read the FODS into OpenOffice (it was recognized as Spreadsheet) and saved it in Excel format.

icecream5058 commented 7 years ago

I am also very interested to know how to extract pt data into .csv. Thank you!

aardappel commented 7 years ago

@icecream5058 : have a look at https://github.com/aardappel/procrastitracker/blob/master/PT/file_format.txt

easz commented 7 years ago

if any one wants to load PT file in javascript...

https://github.com/easz/procrastitracker/tree/master/tools/js

DracoTomes commented 5 years ago

I wrote a little python script to read it but I get a zlib error : incorrect header check.

andyg2 commented 5 years ago

After much fun playing with assembly, java and JavaScript I've found the current version of the database is 13 and I can't find documentation on reading this. Does anyone know the differences between 10 and 13 is? Or is there anyone willing to share a script (any language) to parse this format? Thanks

aardappel commented 5 years ago

The difference is the prefs array grew bigger, file updated here: https://github.com/aardappel/procrastitracker/commit/802bdd491abd40254c9f88a917213d5d29e61b66

toby11 commented 4 years ago

Does anyone have an updated version of pt-dmp.zip that works with version 13 of the database?

I have made a small change myself and it works reading the data but gets stuck in a loop after reading the nodes.

Anything else I should change?

struct { int minfilter, foldlevel, prefs[10]; } x; // was prefs[6]

case 13: gzread(zfd, &x, 36); break; // prefs is now 10 (was 32)

for (ti = 0; ti < 10; ti++) printf(" %d", x.prefs[ti]); printf("\n"); // was 6

toby11 commented 4 years ago

I think the first node is not being read correctly in my case so it thinks there are too many children

minfilter=0 foldlevel=5 prefs= 5 180 10 10 300 0 5 0 0 0 nodename= tagindex=65536 ishidden=0 numberofdays=0 numchildren=1869752320 nodename=ot) tagindex=0 ishidden=0 numberofdays=0 numchildren=19 nodename=notepad tagindex=0 ishidden=0 numberofdays=0 numchildren=2 nodename=*Untitled tagindex=0 ishidden=0 numberofdays=0 numchildren=1 nodename=Notepad tagindex=0 ishidden=0 numberofdays=1 numchildren=0 day=2020-04-04 firstminuteused=7:59 activeseconds=45 semiidleseconds=0 key=68 lmb=11 rmb=0 scrollwheel=0 nodename=pt-dmp.cpp tagindex=0 ishidden=0 numberofdays=0 numchildren=1 nodename=Notepad tagindex=0 ishidden=0 numberofdays=1 numchildren=0 day=2020-04-03 firstminuteused=16:27 activeseconds=15 semiidleseconds=0 key=0 lmb=3 rmb=1 scrollwheel=11 nodename=notepad++ tagindex=0 ishidden=0 numberofdays=0 numchildren=4 nodename=Keep non existing file tagindex=0 ishidden=0 numberofdays=1 numchildren=0

toby11 commented 4 years ago

ok fixed this myself - the bytes to read needs to be 48 for version 13

10 4 + 2 4 (4 byte integers )

case 13: gzread(zfd, &x, 48); break;

toby11 commented 4 years ago

I have a c# version of the code now working if anyone wants it. I'll put on Github once I have tidied it up a bit.

XavierTolza commented 3 years ago

Anyone has a python version for it? I fail parsing a node, I'm getting a number of days ridiculously hight. Here is my code: https://pastebin.com/UTFxHD6X And I get the following:

{
 "version": 13,
 "magic": "PTFF",
 "numtags": 15,
 "tags": [
  {   "name": "UNTAGGED",   "color": 13684944  },
  {   "name": "work",   "color": 6356832  },
  {   "name": "games",   "color": 6316287  },
  {   "name": "surfing",   "color": 6356991  },
  {   "name": "entertainment",   "color": 16736511  },
  {   "name": "communication",   "color": 16777056  },
  {   "name": "organization",   "color": 16736352  },
  {   "name": "project 1",   "color": 255  },
  {   "name": "project 2",   "color": 65280  },
  {   "name": "project 3",   "color": 16711680  },
  {   "name": "project 4",   "color": 11579488  },
  {   "name": "project 5",   "color": 11559088  },
  {   "name": "project 6",   "color": 6336688  },
  {   "name": "project 7",   "color": 6316128  },
  {   "name": "project 8",   "color": 11579568  }
 ],
 "minfilter": 0,
 "foldlevel": 1,
 "prefs": [  5,  180,  10,  10,  300,  0,  5,  0,  1,  0 ],
 "root": {
  "name": "(root)",
  "tagindex": 0,
  "ishidden": 0,
  "numberofdays": 606742016
 }
}

It looks like I'm missing something?

aardappel commented 3 years ago

@XavierTolza your code looks correct at first glance. Maybe dump all bytes starting from (root) to see where its going wrong?

Also, this code may be very slow with all the slicing that you're doing..