ActivityWatch / aw-core

Core library for ActivityWatch
Mozilla Public License 2.0
48 stars 47 forks source link

Creating new table in peewee database #68

Open nicolae-stroncea opened 6 years ago

nicolae-stroncea commented 6 years ago

I've created a new table in the peewee.py file. I've tried both make build and make install in aw-core, yet it doesn't seem like the peewee file ever runs the code. I've looked into the source code and couldn't find when the PeeweeStorage was initialized. Any idea on how to get the file to run manually?

Code below:

if not BucketModel.table_exists():
    BucketModel.create_table()
if not EventModel.table_exists():
    EventModel.create_table()
if not TestModel.table_exists():
    TestModel.create_table()
self.update_bucket_keys()
ErikBjare commented 6 years ago

What you did should be enough, if you put a print statement in the PeeweeStorage.__init__ does it properly print?

If you're going to write a cache I'd suggest you'd use something else than the datastore storage strategies like PeeweeStorage (which shouldn't be complicated by caching, and caching should be storage-method independent).

@johan-bjareholt is working on implementing a storage method using plain SQL for better performance (see PR in aw-core), yet another reason why you'd want to be storage method independent.

I'm still not quite sure how you're planning to do the caching, how would it work with queries?

johan-bjareholt commented 6 years ago

I'm still not quite sure how you're planning to do the caching, how would it work with queries?

When I attempted this previously I had an ID for each query and saved the result. My plan for it was to later query it in a hierarchy like hour->day->week->month->year (which is likely very hard due to timezone issues like you said) but never got that far since I thought that a good query system was of higher priority (query2). It would also be good to save down the last time of access of the cache so we can auto-clean the cache so it doesn't grow too much.

ErikBjare commented 6 years ago

@johan-bjareholt Yeah but queries can return all kinds of results, how would you know how to merge query results?

johan-bjareholt commented 6 years ago

Yeah but queries can return all kinds of results, how would you know how to merge query results?

That didn't use to be the case in query1, good point. We could do some kind of index on some fields on some buckettypes which are useful and not very unique (such as appname in on currentwindow or language in app.editor.current). We probably shouldn't hardcode that though, so not sure how to do that properly (an extra field when creating a bucket?). Possibly also different types of aggregation types also, such as average on numbers (average tabs open for example). Since the fields in "data" are not hardcoded though this could be a bit sketchy.

EDIT: I do not think this is a good solution, just throwing around ideas

nicolae-stroncea commented 6 years ago

@ErikBjare @johan-bjareholt

This is the structure I was considering: Store the data according to the following columns: Key(URLs, domains, app_events, title_events), Value(i.e github.com, localhost:5600), Duration, and Date. It would make it easy to insert and perform queries on it.

You can essentially run the same functions for summary(browserSummaryQuery, windowQuery), and then just insert the data from them into the table. When you'd do the summary over a time period, you'd select the key(depending on what you want to summarize), and the date period.

What do you guys think? EDIT: Instead of Duration you can have "Duration+AFK" and "Duration-AFK" depending on whether the user wants to filter afk time or not.

johan-bjareholt commented 6 years ago

I still don't believe that this is a good idea though because it's not flexible enough.

On the other hand, we could just invalidate the cache every time the user upgrades activitywatch and we have a new format. Still doesn't seem like a clean solution to me though.

nicolae-stroncea commented 6 years ago

What other features would a more flexible cache have?

johan-bjareholt commented 6 years ago

A flexible cache would not hardcode the columns of the DB table. We want to allow third-party buckettypes to be cached as well.