Open ErikBjare opened 8 years ago
It might be better to do redaction by simply removing the window-title property of the event entirely.
I've done some basic work in https://github.com/ActivityWatch/aw-analyser/commit/eb45bf691ae128fcda9010c5500e5bba09e59ad1
Would it make more sense to instead of redaction implement some kind of data encryption? Different users might have different opinions on what constitutes sensitive data...
For instance with some from of asymmetrical cryptography (public/private key) encryption would add no requirements but accessing the data would require a private key and/or a password.
I'd also really like to have a filtering feature, so I'll add my two cents here. (Edit: Well, that grew to a bit more than two cents... o.O)
In my wanton imagination it would look something like this:
Via a html form (on the activitywatch website) the user can create filters (e.g. delete event if event.data.incognito == true
).
He then can apply these to
(1.1) filter future events sent to the server (from all watchers) and
(1.2) remove already existing events in the database.
Filters should consist of two parts, namely
(2.1) Filter criteria
e.g. if event.data.incognito == true
(2.2) Filter action
e.g. then remove event.title
Specifies when the filter should be applied. Following functions would be nice:
Examples:
if event.title includes 'Private Browsing'
if event.data.incognito equals true
if event.data.nested.val differs 'abc'
if event.data.val regex i_dunno_about_regex
if event.data.count > 10
Examples:
if event.title includes 'Private Browsing' and event.data.isSensitive equals true
if event.title includes 'Private Browsing' or event.data.isSensitive equals true
e.g. if watcher_name equals 'aw-watcher-vscode'
Examples:
if event.timestamp is in_time_range(7:00, 9:00)
if event.timestamp is on_day('Monday')
Specifies what should be done, if applied. Following functions would be nice:
Examples:
remove event
remove event.title
Examples:
replace event.title with 'REDACTED'
Filters should be createable via a html interface on the localhost site (http://localhost:5600/filters)
(3.1.1.1) A list of the active filters with the options to [edit/copy/disable/delete] the filter (3.1.1.2) Option to add a new filter
Should be easy to understand for non-coders. Likely with dropdowns and predefined fields. (3.1.2.1) Filter name (3.1.2.2) Filter criteria (see 2.1) (3.1.2.3) Filter action (see 2.2)
Someone knows of a library for that...??? o.o (3.2.1) API endpoint (3.2.2) filter parser (3.2.3) store filter in file/database (3.2.4) filter incoming events by stored filters
This would be really nice, as we then can give the users a list of available options when creating filters (e.g. data.[dropdown: 1) pizza, 2) pasta, 3) ...]) and for making sure a filter is valid. (3.3.1) Alter create_bucket method to take additional data_structure parameter (3.3.2) On API, check if the sent event matches the data-structure
Of course, I am realistic that it would take time to implement this, especially if there's no library for this. But from my point of view, this would enhance this tool really much.
Also much of this is just nice-to-have and doesn't need to be implemented right from the beginning. I just thought I would write out everything, so that while developing we can keep an eye on these (and maybe code in a way these other options can be implemented easily)
From next week on, I would have more time for developing, so until then maybe we can discuss if/how we should implement this? :)
Had a bit time, so here is a quick draft showing what I mean with these filters: https://github.com/Otto-AA/aw-filter/blob/master/filter.py After trying out a bit, it actually seems rather easy to implement these filters in python. Thought it would be much more work O.o
Nonetheless, before starting getting into the details we should agree on how we implement it ^_^
Any thoughts on this proposal? If not, I'd do a bit more work and then create a pull request in aw_server
@Otto-AA I've only skimmed through it as of now, but seems to be kind of in-line with what we have been thinking aswell. As of now I want to prioritize editor format and visualizations and once that's done the more important feature IMO is tagging (which would feature some similarities in the datastore, making this easier later on). But even more important is making a final 0.8 release.
This task is huge (just planning and prototyping the design would probably be 2 complete days of full work), so I'm not sure if I want to prioritize discussing the design of this as of now. I'm sorry, I really want this feature aswell.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Since categorization is now done, I'd just like to throw out a suggestion: one way to do this is to have a "sensitive/to-redact" category and then wipe the title/URL/app of all the events that match the category.
@ErikBjare That is not a good solution in terms of security, to make it truly secure we have to never even add the data to the buckets in the first place, not filter it when querying.
We could add the new settings API to solve this, add a way in the web-ui to add regexes which should be filtered and then let aw-watcher-window check those on startup and filter them before the events get sent,
There's also a duplicate feature request on the forum https://forum.activitywatch.net/t/add-an-exclude-list/345
to make it truly secure we have to never even add the data to the buckets in the first place
Agreed.
not filter it when querying.
That's not what I mean. I mean to classify & filter when a heartbeat is received.
We could add the new settings API to solve this, add a way in the web-ui to add regexes which should be filtered and then let aw-watcher-window check those on startup and filter them before the events get sent,
That makes the watchers depend on the server settings, and also requires us to implement the same filtering in all watchers. It's a bit more secure than what I had in mind since the server would never see the sensitive info at all, but not sure if it's worth it.
It's worth mentioning that the rules themselves are sensitive information, especially if they only contain a few things, making the "anonymity set" for redacted events small. However, it would be less of a problem if we went for deleting events entirely.
In any case, I've been thinking of building a feature in aw-webui that lets you search for events matching a particular pattern, and then let you delete them or replace them with redacted versions of the events. Wouldn't take that much work to build, search would be a generally useful feature anyway, and wouldn't add any code to the server or watchers.
That's not what I mean. I mean to classify & filter when a heartbeat is received.
Oh, alright.
Might still be an issue though, either we need to be aware of bucket types (so we for example don't corrupt events in buckets we don't expect to, for example replacing "afk" with "redacted" or something). At that point architecture wise it makes more sense for the watchers to themselves solve redacting sensitive information in a way that matches their event format well.
That makes the watchers depend on the server settings, and also requires us to implement the same filtering in all watchers. It's a bit more secure than what I had in mind since the server would never see the sensitive info at all, but not sure if it's worth it.
Agreed, currently that's just a few watchers (aw-watcher-window and aw-watcher-web) but in the future it might become more.
It's worth mentioning that the rules themselves are sensitive information, especially if they only contain a few things, making the "anonymity set" for redacted events small. However, it would be less of a problem if we went for deleting events entirely.
Very good point, didn't think of that.
In any case, I've been thinking of building a feature in aw-webui that lets you search for events matching a particular pattern, and then let you delete them or replace them with redacted versions of the events. Wouldn't take that much work to build, search would be a generally useful feature anyway, and wouldn't add any code to the server or watchers.
Definitely a good start!
Not sure myself which one of our suggested solutions are the best, both have their pros and cons really.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I've added a new example redact_sensitive.py
to aw-client
that can be used to redact sensitive events: https://github.com/ActivityWatch/aw-client/blob/master/examples/redact_sensitive.py
Why not encrypt the data going in?
The goal should be to not leak any private information if your machine gets hacked (unfortunately very common)
Then require a 2FA to view your own data.
I've been looking at doing a change the Rust server engine to just filter out events on its side. So no matter which tracker is sending data to the server, the server itself is responsible to filter them out based on regex matches.
Looking at making the configuration be part of the server config file at this point for simplicity, but ultimately I think having a filter table built into the database would be useful so then the front end could easily then send new filters to the backend.
@ErikBjare @johan-bjareholt - Would this be a PR you would be interested in.
Just to throw out a couple of ideas relating to this/window titles that I'd like to see realised (some points mentioned earlier by others):
A way to:
I've known about ActivityWatch for many years and have probably installed it once every year or two, but the lack of any way to disable window title capturing completely has always caused me to inevitably uninstall. Until there are ways for a user to handle window title capturing, an on/off switch would be great. Excuse any ignorance as my overall experience with ActivityWatch is quite limited. I hope that will change because I'd really like to use your useful program.
Thanks.
Shouldn't this be as easy as creating a filtering list on the server side, thus "if entry has a match with a filter, don't add it to the sql database"? (I mean, I suppose not... else it wouldn't be open since 2016 đŸ¥²)
I can easily create a Category with a "Private browsing" pattern which correctly identifies all my "Private Browsing" data; A really simple button named like "filter out data from this category" would work perfectly well for a lot of people.
Currently, there is no solution nor compromise which would fix / alleviate the problem, apart from using this pull or running "redact_sensitive.py" periodically.
Doesn't Chrome already know where you've been? (Unless you turned off all settings to track you?)
I believe most AW users' expectations differ wildly from those of a https://www.qubes-os.org/ user
If you really, really can't trust yourself with what you're doing on your computer, simply use a different operating system that allows you to hide entire compute workloads from yourself.
ActivityWatch, in my mind, is not for PEPs or investigative journalists, it's for everyone else who wants more control (but not total control, as if that were even a thing...) over their digital crumbs, and trusts themselves enough with a local database, on a non-air-gapped computer, likely connected to the internet.
If you need even lower level trust, go for https://puri.sm/ with Qubes on it :-)
No need to overcomplexify AW, IMO
The default should be to respect private browsing, with opt-out option if somebody wants to record that. Mostly people will not want to record private browsing time, which by default for most people is not work related anyway.
I do want to record private browsing time.
The reason I use private browsing and VPNs is to hide my activity from others on the web, not from myself. The reason I use AW is to surface insights into my own digital behavior (on and off the web, work and personal, both), private browsing included.
Actually, I use multiple computers (and VMs) and I'd like to track my behavior across all these (virtual) devices, not just my "main" device.
I do trust my LAN/VPNs to not be compromised... and AW fits the bill quite nicely. :-)
We need a model to filter out sensitive data by default.
For example if a window title contains: "[title] - Firefox (Private Browsing)" we should redact [title] to some magic string such as "REDACTED".
For some cases we might want to filter the window out entirely, giving 0 information about which window is running, better catch too much than too little.
It should be the goal that every user has a set of "clean" data. The filtering should also be able to be run on an existing database of data, so that cleaner data can be output. Preferably, the data should be so clean that there is little (or even no) reason not to share it (which would be great since easy access to a large dataset could make research in some areas a lot easier!).
The question left is where this processing step should take place. We want the filtering/redacting to happen before data is sent anywhere but it should also be able to be enforceable on a server (if the server owner doesn't trust the servers security, if in the cloud for example) and have clients notified of this so that they can do the filtering on their side, removing the need to send sensitive data at all. It might therefore be prudent to write a module in aw-core that implements this functionality since it should be useable from the server and all clients (which transmit sensitive data).
This feature should be on by default, we don't need anything advanced yet, first priority is to redact titles from Incognito/Private Browsing, that's a good step in the right direction.
This should have a far higher priority than Zero-Knowledge storage right now, because it's a lot easier and is more user friendly (In ZK storage: if you lose your keys you lose your data).
Useful when:
This issue was originally moved from https://github.com/ActivityWatch/aw-server/issues/4 which ended up here because it ended up having wider scope not only relating to aw-server.