douglasg14b commented 5 years ago

The Problem

One of the larger time-sinks today is video. Be that through streaming services like Netflix, Amazon Prime Video, or HBO. Media sites like YouTube, Twitch, or Vimeo. Or from downloaded or streamed media on players like VLC or MPV.

Activitywatch, unfortunately, fails to effectively record the time spent on these activities as it relies on mouse and keyboard input to determine activity. This means when watching a video, Activitywatch will mark the time as afk after a short while. Even though the user is present, and spending time on an activity at their device.

This was brought up in https://github.com/ActivityWatch/activitywatch/issues/186 which was marked as wontfix. I believe that this something that CAN be solved, and should be seriously considered given the amount of time that can be spent consuming media.

Afk time can be disabled, but this then pollutes the data. Users may go afk for a variety of times in a variety of applications or websites throughout the day, which could pollute the afk time to the point of video-specific time may no longer be useful.

Possible solutions

Note: Not all problems/disadvantages/pitfalls are meant to be solvable. I am including them for devils advocates sake and to foster a more robust discussion.

Application/Site Tagging

Compile a list of common media applications and websites. When the use goes afk on this site or application, mark the time as non-afk. This isn't technically tagging, but it could be setup to work in a tag-like way, which would make this into a very extendable solution for more than just videos.

Advantages

Easy to maintain
- Add sites/applications as you see fit, or as the community points them out
Simple

Disadvantages/Pitfalls

Initial list will be a chore to create
- Can always ask the community for assistance. You create the functionality, and get the community to fill it in with you curating.
- Is not quite perfect.
- What if the user is legitimately afk on a youtube page?
- What if the user is afk on the homepage of a media site, and not on an actual video?
- What if the user is afk on a video, and it's paused?

Enhancements

These are here to try and solve for some of the problems presented under disadvantages.

User-defined or user changeable lists and filters
- The repo still maintains a list, but users have the ability to add items to theirs without waiting for a pull request to merge and a new release.
- There are a variety of flavorful solutions to this that I will not go into
Combine with monitoring hardware to detect if a video might be paused

Monitoring Hardware

Monitor audio output to see if a video is playing.

Advantages

Simple. May not be easy to implement, but is a simple solution to the problem

Disadvantages/Pitfalls

What if the user is listening to music, like spotify, and is afk on a media player?
What if the user is playing music from a media site like YouTube?
Difficult to implement?
Monitoring multiple audio devices necessary?

Enhancements

Combine with site/application tagging

General Enhancements

These are enhancements that could apply to any solution, to increase accuracy and to enable the user to correct mismatches and errors.

User-Defined Lists & Filters

Let the user create/modify the list of sites/applications, and/or the patterns used to match them.

Takes burden off of developer to maintain accurate lists and patterns to meet everyone's specific needs
- I would still expect than an "official" or pre-defined list of items and patterns would be bundled or downloadable
Some passionate users can open issues to merge items and patterns back into the repo with the dev curating them, or just to share them.

Tagging and Pattern Matching

This goes above and beyond, but would really turn this into a much more powerful tool.

Instead of just solving the video problem. Create an extendable solution that encompass the general problem category that the video problem is part of. This would be in the form of tagging, being able to automatically tag domains & applications with predefined or user defined tags. This can be facilitated with pattern matching lists, and depending on the data's schema/format could be applied to already existing data greatly enhancing it's utility.

As an example, time in VLC, YouTube, or Netflix could be tagged as video, which gives users the power to filter this time separately, combine it in reports, or to more easily correct collection errors.

This of course could be setup to be user-manageable, with points listed in the previous section.

Conclusion

I believe the ability to capture time spent consuming video-based media will have an impact on the future usability of this project as these sorts of services continue to expand and bring in more and more people. Solving this problem can not only provide a solution to this problem, but could also greatly enhance the utility and power of this application.

What are your thoughts? (please to not be automarking this as closed, this took some time and effort to create).

ErikBjare commented 5 years ago

Thanks for taking the time to write all that, greatly appreciated!

I agree, and there is actually unmerged code that solves the issue for web-based media players through the use of the audible property on tabs: https://github.com/ActivityWatch/aw-webui/pull/85

I'm personally pretty happy with the above solution, as it creates minimal complexity and works for the (what I suspect is) the most common ways people consume video on their computers today (YouTube, Netflix, other web-based players).

I'd love to give more thorough feedback on the options you mentioned, especially tagging, but I'm really busy with exams this week so it'll have to wait. In the meanwhile, check out the discussion in #95

nicolae-stroncea commented 5 years ago

This is an idea aimed towards video-playing apps, which is a big part of consuming media.

Make a separate watcher for the media. The watcher could be either just for the PC media software or for both(it would get the audible property from the web watchers).

Have a white-list of apps, check if they're on the screen. To count time on the media-watcher, you just count the time the app is on display. We have the same downside that @douglasg14b mentioned, i.e What if user really pauses the app and leaves whilst app is on the screen?

The solution:

Check if the computer is asleep or not. On both, Linux, Mac and Windows, the computer will usually not go to sleep if there is media(This would work for video, not sure about purely audio) playing. One thing to further investigate is whether the media software needs to be in full screen and playing for the computer not to go to sleep. Therefore, if there is a media app on the screen, and the computer is not asleep, we would log that as playing time for the app.

Following false positives would occur:

App is paused but still on the screen whilst user is not afk would be logged as active.(this will happen rarely happen, as users tend not to fill screen real-estate with a video app if it is not being used).
Computer never goes to sleep(this is also a fairly rare case), so media time would be logged even if video is paused.
Accuracy also depends on the time it takes for the OS to go to sleep following user inactivity, including media.

douglasg14b commented 5 years ago

@nicolae-stroncea

Detecting active audio alongside a list of sites/apps would bring the accuracy up to a very acceptable level in my opinion. Either of them by themselves would be too riddled with false positives to be too terribly useful. There isn't much need to go fancier than that imho. This is something I went into in the initial post.

Active audio + afk + on youtube = watching video. It's not perfect, but much better than on youtube = watching video. As an example, I have 4 monitors, and I almost always have something playing when working, and will often click on that to pause it then leave for a while. Leaving myself afk with an active video player that isn't playing. Or even watching Netflix, pause it and leave for a while, it's the active window, but nothing is playing.

Letting users create their own pattern matching for sites will also bring up the accuracy as it lets them add sites/applications to the list of video apps/sites.

nicolae-stroncea commented 5 years ago

@douglasg14b

I agree that detecting audio would be the most accurate way of doing it. As you've stated, it is a complex solution. The solution I offered was meant as a less complex alternative (i think it's fair to say it would be easier to implement with the existing code, and might have a lot fewer edge cases than if we go into monitoring hardware), but at the sake of less accuracy. It ultimately depends on the amount of effort that will be put into the feature. If monitoring hardware to detect audio successfully will take too many man-hours to implement at the time being, I think the solution I suggested is a feasible proposal.

I disagree that it would be so riddled with false positives to not be useful. youtube is an active window && computer not on standby = watching videowould be the more accurate representation. For your example, where you would watch Netflix, pause and leave for a while. Presumably, the computer would go on stand-by, at which point, the Netflix app would not be counted anymore for active media time.

douglasg14b commented 5 years ago

I believe that relying on standby is an incorrect assumption for users of this library. How many people's devices that are not laptops go into standby within a few minutes of it being idle? Even plugged in laptops default to 30-60mins on Windows 10 in balanced mode, what about high performance mode? On desktops?. You're looking at 30m - 4h IF standby is even enabled, and their devices don't just turn off the screens and stay on. Nevermind most Linux users who probably don't use standby at all from what I've seen as it's usually not on by default for most desktop installs of common flavors

That's a lot of invalid data. If you're watching a movie, and you step away to do something (bathroom, cleaning, walk to dog, cooking, make coffee...etc) are most people actually gone long enough for their device to go into standby (60mins)?

That would also rely VERY heavily on the end users setup, which can vary wildly, especially when assuming that their power configuration is not set as the defaults. Assumptions on user device configuration shouldn't be made unless there is data to back it up. Which is why I believe it will be more inaccurate, and potentially worse than just not recording it at all as it currently does.

Thankfully this is a FOSS project, so man-hours isn't as much of a concern as if this was an in-company product with expenses and wages to worry about. It's still relevant, but at least in my projects, I don't consider time to implement as a deciding factor for features or compatibility unless a solid and usable drop-in is available.

dynamiclover commented 5 years ago

Has anyone investigated integrating with media players through the same mechanisms as last.fm/audio scrobblers?

jtrakk commented 5 years ago

Another idea would be to take a screenshot and if the screen has changed, consider it active (not afk).

johan-bjareholt commented 5 years ago

Has anyone investigated integrating with media players through the same mechanisms as last.fm/audio scrobblers?

@dynamiclover We have aw-watcher-spotify as an experiment https://github.com/activitywatch/aw-watcher-spotify

Another idea would be to take a screenshot and if the screen has changed, consider it active (not afk).

@jtrakk Two issues with this

Most users have a clock in their taskbar on their computer, this will change at least every minute. Then you you could argue that you need a minimum amount of pixels to change, but that still becomes inaccurate as there might still simply be a website with an ad which has an animation which makes it think that you are not afk.
Comparing two pictures pixel by pixel is surprisingly slow nowadays due to how high resolution screens are nowadays. We don't want activitywatch to slow down peoples computers significantly (not by default at least, opt in could be an option)

douglasg14b commented 5 years ago

@johan-bjareholt

I think that actually might be a viable solution, the issues you mentioned are solvable. I wouldn't take it off the table just yet. It's also very simple, and doesn't have a lot of complexities compared to monitoring audio.

You can probably even use something like OpenCV for this, which has a lot of utilities that make this even simpler like absdiff.

Down-scale the image, which actually does two things

Reduces the pixel count, say to 50k pixels
Removes and reduces small and inconsequential changes (Like a system clock).

Perform cheap math to check the delta from one image to another

Sum the squared differences of the pixel values
- You could also diff the lightness values, which might be even cheaper
Set a threshold to avoid false positives. If someone is watch a video, a LOT should be changing. So having a high threshold should be fine.

Don't monitor in real time.

Periodic image processing is damn cheap compared to anything in real time, and causes much less of a performance concern. It's trivial to do the above at 30fps even on old hardware. If you are checking for changes, say every 15 seconds, that's 0.2% of the processing power needed.
If the concern is performance spikes, then draw out the processing time. Instead of running the loop all in one go, let is sleep for a few microseconds every few loops. IDK how to do this in Python, but in C# it's fairly easy to avoid hogging system resources through asynchronous processing with artificial delays. I imagine Python can d the same.
- It's also fairly trivial, in my experience, to customize the delays based on prior performance of the device.

Also refer to https://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity

johan-bjareholt commented 5 years ago

@douglasg14b I'd gladly help getting it to work with aw-server and the web-ui if you want to make such an watcher for ActivityWatch, we love to help anyone who wants to collect more data to activitywatch and make it possible for them to analyze it. You can even write the watcher in C# if that's the language you prefer, we have one watcher already which is written in that which you can get some inspiration from (https://github.com/LaggAt/ActivityWatchVS)

However I don't think this is something we want to ship with activitywatch by default and definitely not have turned on by default because:

It is not 100% accurate as we have discussed before, you can easily leave your computer with a movie running
OpenCV s a pretty big dependency, 26MB for the python bindings
Taking a screenshot is not universal on mac, windows and linux and needs to be implemented differently on each platform. Not impossible to do, just needs quite a bit of development and testing. Would probably also need more dependencies
I don't think performance will matter a great deal if written efficiently, but I don't believe that it ever will be negligible. If we have to go as far as to have to add artificial delays to avoid taking up a lot of CPU time I personally wouldn't use such an watcher on my laptops at least, probably not on my desktop either.

I personally don't want to spend time on this because I find there to be more important things to fix currently.

jtrakk commented 5 years ago

This seems doable with Pillow and pyscreenshot. Something like this might work, perhaps as a third-party watcher package.

import time

import pyscreenshot
import requests

BUCKET_URL = "http://localhost:5600/api/0/buckets/screenshot-rgb"
INTERVAL = 10

requests.post(BUCKET_URL)

while True:
    # Take a screenshot.
    im = pyscreenshot.grab(childprocess=False)
    # Get average value for each RGB channel.
    rgb = im.resize((1, 1)).getpixel((0, 0))
    # Post the rgb values.
    requests.post(BUCKET_URL + "/heartbeat", json={"rgb": list(rgb)})
    # Wait a few seconds before repeating.
    time.sleep(INTERVAL)

johan-bjareholt commented 5 years ago

@jtrakk Nice start, a few suggestions:

We probably should do the image analysis in the client rather than the server, then simply push the data {"afk": true/false}
Instead of doing a proper resize, only picking every fourth pixel or so and doing an average of those should make this faster and still be accurate enough
If you want you can use the aw-client python library instead of requests, easier to get things going and has some heartbeat optimizations

rapiz1 commented 4 years ago

I would like to mention the power management tool powerdevil from KDE can detect whether there is a video playing. But I don't know how they achieve that.

nicolae-stroncea commented 4 years ago

What about doing it inside of the extensions?

Proposal:

Read the DOM of a webpage, query for video elements. let listVideo = document.querySelectorAll('video')
In the current tab, check if video is playing by doing: if(!listVideo.paused){ videoPlaying = True }
Store 'activeVideo' property in datastr to true.
When checking afk status, if property exists and is true, consider user non-afk

Advantages:

Instantly crossplatform, don't need to worry about Mac/Linux/Windows
Should be a lot lighter on resources and storage than doing analysis on the screen. Essentially this would be querying a small list of elements(even youtube frontpage has only 1 video element in DOM, though not sure why), and checking a property of it
Should be easier to implement

Disadvantages:

Would only detect web-based content.
Automatically considers video playing as user being active. We're already considering doing this with all of the other approaches, so this one isn't really a disadvantage

I believe that since majority of media is consumed online, the advantages outweigh the disadvantages.

EDIT: further problem

EDIT 2:

A potential problem here is this would not detect content in iframes. A comparatively small (but existent) amount of media is done through iframes. Example is reuters (go watch an article, and it should pop an iframe with an embedded video). Another example is embedding youtube videos, which is also done through iframes. Maybe there are workarounds for this. The only one I found so far is checking for the 'autoplay` property, which if set, indicates video content. However, this is not foolproof.

On further analysis seems like the 'audible' property is indeed the better choice and given that it is the active tab and audio is playing it should indicate the user is watching some content

nicolae-stroncea commented 4 years ago

I'm currently going through my AW database reviewing all events tagged as audible: true, and overall, all video content is tagged correctly:

Normal videos, streaming content, audio/video online calls, etc

There are a couple of false positives:

Music Websites
Background noise Websites
Podcast Websites
Radio Websites
Outlook(weird outlier. I think it might be notifications that triggered it?)

Since they are purely audio, it is very likely(more often than not, I would say) that a user puts on some music/podcast/radio etc, and then works outside of their computer: typing notes, cleaning, etc. So I think we could have a whitelist of these websites where we consider content as 'afk' even if they have their audio property set to true.

nicolae-stroncea commented 4 years ago

Found a way to do this directly with Sound Drivers using a Python library called SoundCard. This works with any type of applications, not just web browsers.

Windows and Linux

Tested successfully on both Linux (relies on PulseAudio, so should work on all distributions by default) and Windows (relies on WASAPI, works on Windows 7+).

#!/usr/bin/env python
import soundcard as sc
import numpy as np

'''Get a microphone from a speaker, not the actual microphone'''
def getMic():
    mic = None
    mics = sc.all_microphones(include_loopback=True)
    for a_mic in mics:
        if(a_mic.isloopback):
            mic = a_mic
            break
    return mic

def checkAudio(mic):
    isAudio = False
    if(mic is not None):
        # record 1 second
        data = mic.record(samplerate=48000, numframes=48000)
        isAudio = np.any(data != 0)

    return (isAudio)

mic = getMic()
checkAudio(mic)

Mac

This will not work by default on MacOS because it does not provide loopback functionality.

Mac Users would have to download SoundFlower (also OSS), and set it up so it acts as a 'virtual speaker'.
We need to find name of the speaker. It should always stay the same, so somebody just needs to download it on a Mac and check.
Need a specific check in getMic for MacOS, and then get the mic that has SoundFlower's name.
Rest should be the same

I don't have a Mac to test this, so somebody should confirm to see if this works.

jmealo commented 4 years ago

@nicolae-stroncea I'll try this on my Macbook and report back shortly.

nicolae-stroncea commented 4 years ago

@jmealo Not sure if you already found it, but this tutorial seemed useful to me. It helps avoid some potential pitfalls of the setup, specifically that if you don't set multi-output, your Mac won't play any sound at all since all of it will be routed only to SoundFlower. It also explains how to select SoundFlower as an input device, which is what we need

jmealo commented 4 years ago

I'll still test Soundflower, but, I found this: https://stackoverflow.com/questions/27604207/applescript-check-if-computer-is-playing-any-sound#27608712

When I play a YouTube video in Chrome:

pmset -g | grep coreaudiod
sleep                1 (sleep prevented by sharingd, Google Chrome, coreaudiod, useractivityd)

When I paused the video coreaudiod stopped preventing sleep and no longer appeared in the output.

I fired up Zoom, with no meeting there was no output, upon starting a new meeting:

 hibernatefile        /var/vm/sleepimage
 disksleep            0
 sleep                1 (sleep prevented by sharingd, coreaudiod, coreaudiod)
 displaysleep         2 (display sleep prevented by zoom.us)

As far as false positives go: assuming a browser extension, you can differentiate between listening to music/watching a video.

If you poll this at regular intervals, you don't have to worry about notifications much. It seems like video conferencing will prevent the display from sleeping. I can test with something that uses WebRTC and verify.

jmealo commented 4 years ago

I'm providing the output of some pmset commands that should provide information helpful for time/activity tracking:

While playing a Youtube video in Chrome:

2020-06-13 14:16:26 -0400 
Assertion status system-wide:
   BackgroundTask                 0
   ApplePushServiceTask           0
   UserIsActive                   1
   PreventUserIdleDisplaySleep    0
   PreventSystemSleep             0
   ExternalMedia                  0
   PreventUserIdleSystemSleep     1
   NetworkClientActive            0
Listed by owning process:
   pid 434(sharingd): [0x0000377400018c33] 00:00:40 PreventUserIdleSystemSleep named: "Handoff"  
   pid 626(Google Chrome): [0x000036c100018c27] 00:03:39 NoIdleSleepAssertion named: "Playing audio"  
   pid 273(mds_stores): [0x0000379c000b8c46] 00:00:00 BackgroundTask named: "com.apple.metadata.mds_stores.power"  
   pid 198(coreaudiod): [0x0000366f000180cb] 00:05:01 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineOutput:1B,0,1,1:0.context.preventuseridlesleep"  
    Created for PID: 742. 
   pid 431(useractivityd): [0x0000379a00018c45] 00:00:01 PreventUserIdleSystemSleep named: "BTLEAdvertisement"  
    Timeout will fire in 58 secs Action=TimeoutActionTurnOff
   pid 151(hidd): [0x0000365400098c0a] 00:00:00 UserIsActive named: "com.apple.iohideventsystem.queue.tickle serviceID:100000363 name:AppleEmbeddedKeyboa product:Apple Internal Keyb eventType:3"  
    Timeout will fire in 120 secs Action=TimeoutActionRelease
No kernel assertions.
Idle sleep preventers: IODisplayWrangler

While in a Zoom meeting (it looks like the developers forgot to provide the correct value for the activity):

% pmset -g assertions
2020-06-13 14:18:49 -0400 
Assertion status system-wide:
   BackgroundTask                 0
   ApplePushServiceTask           0
   UserIsActive                   1
   PreventUserIdleDisplaySleep    1
   PreventSystemSleep             0
   ExternalMedia                  0
   InternalPreventDisplaySleep    1
   PreventUserIdleSystemSleep     1
   NetworkClientActive            0
Listed by owning process:
   pid 26724(zoom.us): [0x0000381e00058c76] 00:00:12 NoDisplaySleepAssertion named: "Describe Activity Type"  
   pid 434(sharingd): [0x0000377400018c33] 00:03:02 PreventUserIdleSystemSleep named: "Handoff"  
   pid 106(powerd): [0x0000381600108002] 00:00:20 InternalPreventDisplaySleep named: "com.apple.powermanagement.delayDisplayOff"  
    Timeout will fire in 100 secs Action=TimeoutActionTurnOff
   pid 431(useractivityd): [0x0000382700018c78] 00:00:03 PreventUserIdleSystemSleep named: "BTLEAdvertisement"  
    Timeout will fire in 56 secs Action=TimeoutActionTurnOff
   pid 384(nsurlsessiond): [0x0000382800018c7a] 00:00:02 PreventUserIdleSystemSleep named: "NSURLSessionTask ADC0E368-B668-4A09-B48C-B1B11C78F152"  
    Timeout will fire in 10798 secs Action=TimeoutActionTurnOff
   pid 384(nsurlsessiond): [0x0000382800018c7b] 00:00:02 PreventUserIdleSystemSleep named: "NSURLSessionTask B2ED8888-9B0E-4A54-9F6F-207CFA4B82A2"  
    Timeout will fire in 10798 secs Action=TimeoutActionTurnOff
   pid 198(coreaudiod): [0x0000381f00018c5c] 00:00:11 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineOutput:1B,0,1,1:0.context.preventuseridlesleep"  
    Created for PID: 26724. 
   pid 198(coreaudiod): [0x0000381e00018c58] 00:00:12 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineInput:1B,0,1,0:1.context.preventuseridlesleep"  
    Created for PID: 26724. 
   pid 151(hidd): [0x0000365400098c0a] 00:00:00 UserIsActive named: "com.apple.iohideventsystem.queue.tickle serviceID:100000363 name:AppleEmbeddedKeyboa product:Apple Internal Keyb eventType:3"  
    Timeout will fire in 120 secs Action=TimeoutActionRelease
No kernel assertions.
Idle sleep preventers: IODisplayWrangler

nicolae-stroncea commented 4 years ago

Also found this command, which seems to draw inspiration from same source : if [[ "$(pmset -g | grep ' sleep')" == *"coreaudiod"* ]]; then echo audio is playing; else echo no audio playing; fi

It doesn't have the same level of detail, but can give a quick, cheap check if audio is playing

jmealo commented 4 years ago

@nicolae-stroncea: you can do all sorts of activity tracking beyond what you set out to do on OSX with pmset -g assertions, you can see whether the user clicks, scrolls, touches, multi-touches, types, etc... (it logs whatever resets the idle user timeout, as well as a count down, you can infer a great deal from this). Additionally, we get verbose logging of what's keeping the system from sleeping, which includes playing audio/video or using the webcam.

I wasn't able to get your Python to run, is it Python 2? I think it's a dead-end (but good idea! especially without having access to the hardware) given what I'm able to do by tailing the power telemetry from OSX.

Using pmset is low-overhead, can run as an unprivileged user, and doesn't require a third-party kernel extension, so it seems like the right way to approach what you set out to do (and then some!). It honestly seems like a bit of an oversight from a privacy perspective shrug.

nicolae-stroncea commented 4 years ago

@jmealo that's pretty neat! I imagine there's a lot of nice aw-watcher possibilities lying in there.

The script is Python3, but it would need some customizing for Mac to get it working with soundcard:

Once you install SoundFlower, you would need to query all of the microphones, and find the name that MacOS uses for SoundFlower: sc.all_microphones(). Iterate through them, then get the name of each microphone by doing the_mic.name, to find what name SoundFlower goes by.
Once you get the name by looking through the mics, you can get the microphone by the name: mic = sc.get_speaker('name_of_soundflower_input')

I agree that since pmset ... is lower overhead, it would be preferred.

I looked for a similar command that could be useful on Linux, and found: pacmd list-sink-inputs (again dependant on the pulseaudio, and I don't think there is a lot of fragmentation on this front). You can find if any sound is running by doing: pacmd list-sink-inputs | grep -w state | grep RUNNING. A pacmd list-sink-inputs returns info on the application running which is useful:

    index: 173
    driver: <protocol-native.c>
    flags: START_CORKED 
    state: RUNNING
    sink: 1 <alsa_output.pci-0000_00_1f.3.analog-stereo>
    volume: front-left: 52016 /  79% / -6.02 dB,   front-right: 52016 /  79% / -6.02 dB
            balance 0.00
    muted: no
    current latency: 89.25 ms
    requested latency: 75.01 ms
    sample spec: float32le 2ch 44100Hz
    channel map: front-left,front-right
                 Stereo
    resample method: copy
    module: 10
    client: 17 <Firefox>
    properties:
        media.name = "AudioStream"
        application.name = "Firefox"
        native-protocol.peer = "UNIX socket client"
        native-protocol.version = "33"
        application.process.id = "5675"
        application.process.user = "nicolae"
        application.process.host = "nicolae"
        application.process.binary = "firefox"
        application.language = "en_US.UTF-8"
        window.x11.display = ":0"
        application.icon_name = "firefox"
        module-stream-restore.id = "sink-input-by-application-name:Firefox"

There are a couple of weird quirks that I didn't figure out about this. If I mute an application (but allow it to run), it will still show up with state: RUNNING and muted: no so not sure why this happens.

Wasn't able to find any similar command for Windows that we would be able to trigger directly from Python, but I'm not too familiar with developing on the platform. Worst case, soundcard could still be used for the cases where a reliable low-overhead platform-dependent command is not found.

jmealo commented 4 years ago

@jmealo that's pretty neat! I imagine there's a lot of nice aw-watcher possibilities lying in there.

The script is Python3, but it would need some customizing for Mac to get it working with soundcard:

Once you install SoundFlower, you would need to query all of the microphones, and find the name that MacOS uses for SoundFlower: sc.all_microphones(). Iterate through them, then get the name of each microphone by doing the_mic.name, to find what name SoundFlower goes by.

Once you get the name by looking through the mics, you can get the microphone by the name: mic = sc.get_speaker('name_of_soundflower_input')

For what it's worth: Soundflower (2ch) or Soundflower (64ch) seem to be the device names.

I agree that since pmset ... is lower overhead, it would be preferred.

I looked for a similar command that could be useful on Linux, and found: pacmd list-sink-inputs (again dependant on the pulseaudio, and I don't think there is a lot of fragmentation on this front). You can find if any sound is running by doing: pacmd list-sink-inputs | grep -w state | grep RUNNING. A pacmd list-sink-inputs returns info on the application running which is useful:
    index: 173
  driver: <protocol-native.c>
  flags: START_CORKED 
  state: RUNNING
  sink: 1 <alsa_output.pci-0000_00_1f.3.analog-stereo>
  volume: front-left: 52016 /  79% / -6.02 dB,   front-right: 52016 /  79% / -6.02 dB
          balance 0.00
  muted: no
  current latency: 89.25 ms
  requested latency: 75.01 ms
  sample spec: float32le 2ch 44100Hz
  channel map: front-left,front-right
               Stereo
  resample method: copy
  module: 10
  client: 17 <Firefox>
  properties:
      media.name = "AudioStream"
      application.name = "Firefox"
      native-protocol.peer = "UNIX socket client"
      native-protocol.version = "33"
      application.process.id = "5675"
      application.process.user = "nicolae"
      application.process.host = "nicolae"
      application.process.binary = "firefox"
      application.language = "en_US.UTF-8"
      window.x11.display = ":0"
      application.icon_name = "firefox"
      module-stream-restore.id = "sink-input-by-application-name:Firefox"
There are a couple of weird quirks that I didn't figure out about this. If I mute an application (but allow it to run), it will still show up with state: RUNNING and muted: no so not sure why this happens.

What a great find! I was looking to see if systemd had something, but, if pulseaudio can be queried directly that'd be good. I can't imagine that there's not a similar solution on any *nix based OS.

Wasn't able to find any similar command for Windows that we would be able to trigger directly from Python, but I'm not too familiar with developing on the platform. Worst case, soundcard could still be used for the cases where a reliable low-overhead platform-dependent command is not found.

It looks like powercfg is what we're looking for on Windows. I'm hoping that read-only operations don't require elevated permissions, write ones certainly do.

jmealo commented 4 years ago

It looks like the output of powercfg -requests looks something like this (found on Microsoft answers for troubleshooting sleep issues):

SYSTEM:
[DRIVER] Cirrus Logic High Definition Audio (HDAUDIO\FUNC_01&VEN_ ...)
An audio stream is currently in use.
[PROCESS] \Device\HarddiskVolume2\Program Files (x86)\Windows Media Player\wmplayer.exe

Here's the documentation that I found for the command so far: https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/powercfg-command-line-options

jmealo commented 4 years ago

I just tested on Windows:

No video/audio playing:

Microsoft Windows [Version 10.0.18363.900]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Windows\system32>powercfg -requests
DISPLAY:
None.

SYSTEM:
None.

AWAYMODE:
None.

EXECUTION:
None.

PERFBOOST:
[DRIVER] Legacy Kernel Caller
Power Manager

ACTIVELOCKSCREEN:
None.

Video playing:

C:\Windows\system32>powercfg -requests
DISPLAY:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Video Wake Lock

SYSTEM:
[DRIVER] NVIDIA High Definition Audio (HDAUDIO\FUNC_01&VEN_10DE&DEV_0072&SUBSYS_38423967&REV_1001\5&34bd84db&0&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Playing audio

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

Audio playing:

C:\Windows\system32>powercfg -requests
DISPLAY:
None.

SYSTEM:
[DRIVER] NVIDIA High Definition Audio (HDAUDIO\FUNC_01&VEN_10DE&DEV_0072&SUBSYS_38423967&REV_1001\5&34bd84db&0&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Playing audio

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

nicolae-stroncea commented 4 years ago

@jmealo nice find! I just tested it, and it works well. Unfortunately, it requires administrative privileges. I had to run powershell as an administrator to get it to work.

Here's the output I got when playing a youtube video for it:

DISPLAY:
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe

SYSTEM:
[DRIVER] Realtek Audio (INTELAUDIO\FUNC_01&VEN_10EC&DEV_0298&SUBSYS_1028087C&REV_1001\4&2223f159&2&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
None.

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

nicolae-stroncea commented 4 years ago

Were you able to somehow do it without privileged access?

jmealo commented 4 years ago

I had to run powershell as an administrator to get it to work.

:( Same here, I used an elevated cmd. I wonder what it uses under the hood? If there's an alternative way to get log entries for this from Windows that doesn't require elevated permissions.

nicolae-stroncea commented 4 years ago

I was able to use this code on Windows to detect sound. Runs without any privileges.

nicolae-stroncea commented 4 years ago

This seems to be the meat of the code:

IMMDeviceEnumerator enumerator = (IMMDeviceEnumerator)(new MMDeviceEnumerator());
IMMDevice speakers = enumerator.GetDefaultAudioEndpoint(EDataFlow.eRender, ERole.eMultimedia);
IAudioMeterInformation meter = (IAudioMeterInformation)speakers.Activate(typeof(IAudioMeterInformation).GUID, 0, IntPtr.Zero);
float value = meter.GetPeakValue();

This seems to just be instantiating a couple of objects and then getting a peak sample value for the audio stream.

Measure-Command {[Foo.Bar]::IsWindowsPlayingSound()} returns this:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 491
Ticks             : 4910539
TotalDays         : 5.68349421296296E-06
TotalHours        : 0.000136403861111111
TotalMinutes      : 0.00818423166666667
TotalSeconds      : 0.4910539
TotalMilliseconds : 491.0539

EDIT: This is on a i7-8750H CPU

nicolae-stroncea commented 4 years ago

The accepted stackoverflow answer uses Add-Type, which seems to compile the C# to Powershell on the go. So we would likely be able to run this command directly from a Python script into Powershell. Somebody else also made the same version of the script in C++(https://github.com/smourier/IsWindowsPlayingSound), so second option would be to just execute the C++ script from Python. I think a good approach might be to just try out all 3 options (SoundCard vs C# with Powershell vs C++ script) and see which one is lightest on resources, my bet is on C++

jmealo commented 4 years ago

@nicolae-stroncea: It looks like UAC eliminated any way for non-administrators to use powercfg (the registry entries that used to work in XP don't appear to have any effect on Windows 10), so, if you went that route you'd need to request UAC permissions (or run as service?). Just for fun, you could try this: https://github.com/rootm0s/WinPwnage

nicolae-stroncea commented 4 years ago

So I put together a quick script which uses platform-specific commands to check for the audio. At the moment you can run it with python media_watcher.py and it will output True/False, depending on if anything is playing

https://github.com/nicolae-stroncea/aw_audio_detector/blob/master/media_watcher.py

For Windows:

Checked it for windows and it worked well for all cases(muted, unmuted, etc).

For Linux:

If audio is not playing: audible is False
If audio is playing, sound on: audible is True
If audio is playing, sound muted from system: audible is False
If audio is playing, muted from application: audible is True Last one is wrong, but it is not possible to get that information unless the application provides an API for it. Only place where that's possible is with browsers by checking audible property.

For Mac:

If audio is not playing: audible is False
If audio is playing, sound on: audible is True
If audio is playing, sound muted from system: Not sure If false, can be fixed by checking for sound running osascript -e 'output muted of (get volume settings)' from Python. Relevant Link
If audio is playing, muted from application: Not sure

I may have missed something so any testing of these would be appreciated

While this doesn't detect video, it can detect audio within and outside the browser

jmealo commented 4 years ago

@nicolae-stroncea: Should the Windows path use %SystemRoot%\system32 / %WINDIR%\system32 or something similar to resolve where system32 is?

I tested on OSX. The script returns true if the system is muted (I used the mute keyboard key to make sure it was a system-level mute, rather than volume: 0). It also returns true if I mute a YouTube video in Chrome. osascript -e 'output muted of (get volume settings)' returns true / false as expected for muted system-level audio.

I did some preliminary digging and found this: https://github.com/kyleneideck/BackgroundMusic/commit/944fc112128e2e9513fea73473c347b1e5bd64f0 (this is an example where support was added for Facetime, suggesting that workarounds are required per-application) There's also source files for various media players. It doesn't appear there's an easy way to enumerate the application-level volumes of all running applications at the OS-level yet.

placeybordeaux commented 4 years ago

As a user I'd personally prefer shipping something like

(youtube|netflix|hulu|ect) is active && no keyboard/mouse movement for up to 1h30m -> notAFK

Would shipping that before the is audio playing something that is viable? I think shipping the is audio playing is an excellent idea, but this issue has been open since Jan 6th 2019.

tmladek commented 3 years ago

I'm sure I'm not the first to come up with this, but what about doing away with complex heurestics and just having a switch "Consider time spent in fullscreen applications as not-afk"?

When I'm watching video, I'm most likely going to be doing so in fullscreen, and I'd like to count this as not AFK.
- I can't really think of a single instance of turning on a fullscreen application and then walking away from my computer - and if that happens, I think I would be happy with errorneously registering this as not AFK.
If I'm not watching video in fullscreen, then it's probably because I'm doing something else at the same time, hence AFK watcher will work as expected.

ErikBjare commented 3 years ago

Assuming Full screen -> active is a heuristic too, and I can see many ways for it to fail (you watch a movie/play a game, but pause/leave the device for a while to take a break).

I'm not even sure if it's feasible to detect if an application is full screen in a reliable and cross-platform manner.

tmladek commented 3 years ago

Yup, but it's a simple one, and as I've said above, I can't really think of an instance where it would fail for me (or at least, fail worse than what we've got now), and if it does, I think I would be fine with the result.

you watch a movie/play a game, but pause/leave the device for a while to take a break

That would result in a couple minutes of wrongly tagged time, as opposed to literal hours.

Additionally, pausing an online game does vaguely match what I would consider "not-AFK" time - a fullscreen app running directly implies that attention is meant to be paid to the computer. I understand this may be a point of contention, but that's why I'm appealing to the simplicity of the heurestic. It's easy to read, since it doesn't rely on complex OS machinery relating to audio, or combinations of window titles and somewhat arbitrary (1hr30) idleness thresholds, etc.

If there's an artifact, it's likely going to be small(er) and easily recognized. And if it's an issue, well, it's a single switch after all. It can even by false by default.

I'm not even sure if it's feasible to detect if an application is fullscreen in a reliable and cross-platform manner.

Well, this thread seems to be considering metrics which so far do not seem feasible to detect even on any single operating system! :) So I didn't think this would be that much of a hurdle.

The bottom line is: This is a somewhat major issue for an activity tracking software, and has been unsolved for more than 2 years. I do appreciate its difficulty and complexity, and I don't claim that fullscreen-tracking is the final solution, but! I think it's a feasible partial solution, and if it were present, I would have simply toggled it on, and went on with my day. "Don't let perfect be the enemy of the good", yadda yadda.

ErikBjare commented 3 years ago

but it's a simple one

I don't think it's any simpler than most of the other things already suggested (but it's a good addition still!).

That would result in a couple minutes of wrongly tagged time, as opposed to literal hours.

Everyone's usage is different. I often leave my computer with a game running for hours. I also sometimes fall asleep to a video playing. Personally, I'm not inclined to implement & maintain a feature I won't have any use of myself.

Well, this thread seems to be considering metrics which so far do not seem feasible to detect even on any single operating system! :) So I didn't think this would be that much of a hurdle.

It does! But those hurdles are exactly why it hasn't been implemented. Although I disagree that the other proposed solutions aren't feasible to detect on a single OS, and I think it's about as feasible to check if audio is playing vs if a window is fullscreen (but hard to speculate without seeing example code for the latter).

The bottom line is: This is a somewhat major issue for an activity tracking software, and has been unsolved for more than 2 years. I do appreciate its difficulty and complexity, and I don't claim that fullscreen-tracking is the final solution, but! I think it's a feasible partial solution, and if it were present, I would have simply toggled it on, and went on with my day. "Don't let perfect be the enemy of the good", yadda yadda.

100%. However, the 'good' solution that's the most likely to get implemented anytime soon is using the audible attribute reported by aw-watcher-web (https://github.com/ActivityWatch/aw-webui/pull/85). It will only work when you use your browser for watching videos (which happens to be the case for me most of the time) and is obviously not perfect, but "Don't let perfect be the enemy of the good", yadda yadda ;)

ErikBjare commented 3 years ago

I just merged https://github.com/ActivityWatch/aw-webui/pull/262 which implements the "audible-as-active" feature. It makes it so that if your browser is the active app, and the active browser tab is audible (playing sound), then it will not count that time as AFK (and therefore make it show on your Activity view).

It requires that you're running the web watcher for your browser.

This vastly improves the situation when you watch a video in your web browser, but it is not a complete solution, so I'll leave the issue open for now.

archiif commented 3 years ago

Will this be available in the nightly build soon? It seems that the aw-webui module is still pinned to the commit from last December.

johan-bjareholt commented 3 years ago

@archiif I just updated aw-server and aw-webui in the main activitywatch repo, should hopefully work. The recent aw-webui change however does not work on aw-server-rust yet though.

johan-bjareholt commented 3 years ago

@archiif I have fixed the integrations tests now too so the nightly builds are now working again.

luckydonald commented 3 years ago

Is that optional? I use the Chrome plugin, and I have music (youtube video though) running pretty much all the time no matter if I'm in front of it or not.

johan-bjareholt commented 3 years ago

@luckydonald Yes it's optional, you can turn it off in the settings.

k8ieone commented 3 years ago

Lots of programs use MPRIS for sending playback info to the system (at least on Linux, I'm not sure about other platforms) so that it can be shown in various places in the system. Just throwing it out there as a possible way to cheaply and universally check if media is playing. Example from GNOME (media playing in Firefox):

ahmednofal commented 3 years ago

I just merged ActivityWatch/aw-webui#262 which implements the "audible-as-active" feature. It makes it so that if your browser is the active app, and the active browser tab is audible (playing sound), then it will not count that time as AFK (and therefore make it show on your Activity view).

It requires that you're running the web watcher for your browser.

This vastly improves the situation when you watch a video in your web browser, but it is not a complete solution, so I'll leave the issue open for now.

what about zoom or conference call apps. I am afraid I find zoom being AFKed most of the time. And it counts actually towards my productivity hours :disappointed:

corradio commented 3 years ago

what about zoom or conference call apps. I am afraid I find zoom being AFKed most of the time. And it counts actually towards my productivity hours 😞

I have the same issue - very happy to contribute if someone can help shape the solution. What about having an exclusion regexp for the afk watcher?

marco-coraggio commented 2 years ago

Just wanted to say I had the same issue with conferencing apps like Zoom, Teams or Skype. Is there any solution to that? Thank you.

ActivityWatch / activitywatch

More accurately record time spent consuming video media #261

The Problem

Possible solutions

Application/Site Tagging

Advantages

Disadvantages/Pitfalls

Enhancements

Monitoring Hardware

Advantages

Disadvantages/Pitfalls

Enhancements

General Enhancements

User-Defined Lists & Filters

Tagging and Pattern Matching

Conclusion

Windows and Linux

Mac