DigitalSlideArchive / digital_slide_archive

The official deployment of the Digital Slide Archive and HistomicsTK.
https://digitalslidearchive.github.io
Apache License 2.0
110 stars 50 forks source link

Annotation time report #214

Closed ds2268 closed 1 year ago

ds2268 commented 2 years ago

Is it possible to obtain per-user time statistics on how much time was spent on labeling a particular slide? One option would be to do a diff on the first and last annotation element on a particular annotation file, but I don't know if the exact time of each annotation added is stored somewhere? Any ideas?

manthey commented 2 years ago

You can extract times from the object ids used for annotations and annotation elements. These are Mongo IDs which have the time encoded within them (you'd have to check the Mongo docs for how). Alternately, you could use the https://github.com/DigitalSlideArchive/annotation-tracker plugin to log user actions.

ds2268 commented 2 years ago

Thanks! I will take a look at Mongo. What would you suggest, given that I already have deployed DSA and in active use? Can I deploy an annotation tracker to the existing DSA deploy - from the docs it seems that it deploys a new HistomicsUI instance or is it just an additional girder plugin, like the ones for analysis? Would this tracker slow down the annotation process of the pathologists labeling thousands of cells? They need to start new annotation files already now in order to reduce the lag after labeling a few hundreds of them.

manthey commented 2 years ago

You can just add the plugin. If you are using the docker-compose deployment, you can add it as part of the provisioning step. Otherwise, you do a pip install of a plugin in the girder container, then do girder build, then restart the container.

The tracker shouldn't add much overhead -- it tracks discrete events and sends them to the server only periodically (more more than every 10 seconds, I think). You probably would need to analyze what events are stored to determine if you can easily get the information you want out of the tracking data. Obviously, it could be customized to do so.

manthey commented 2 years ago

@ds2268 You mentioned that things were slowing down after a few hundred annotation elements. Can you pull the latest dockers and see if the performance has improved for you?

ds2268 commented 2 years ago

@manthey I have forks for DSA, HistomicsUI, and slicer_cli_web_repos due to some minor changes on my side. I have now merged them with upstream to be up to date. What is the proper way to re-deploy with new docker images, not to mess with an existing deployment (e.g. I have custom slicer_cli_web tasks)?

To start it, I used:

./deploy.sh -j 8 --assetstore PATH --cache 16384 -d PATH --logs PATH --port 9090 --user USER --password PASS --cli start

and then devops/build.sh for the HistomicsUI

btw. - does Girder workers > 1 have any effect in real life?

Should I just do pull and then restart with deploy.sh?

I will have major labeling done in June by the pathologists again, so I will be able to test it in real-life.

ds2268 commented 2 years ago

I tried it now and for me, it doesn't improve the performance bottlenecks, but hopefully, I have the latest versions.

I did: ./deploy.sh ... remove to clean the old images and then ./deploy.sh ... start to start the new ones. I also changed pinned versions to "latest" in deploy_docker.py.

I also upgraded my local version of HistomicsUI and slicer_cli_web with the upstream latest.

I suggest that we have a call, where I show you the performance bottlenecks in practice.

ds2268 commented 2 years ago

Ah, I have now cleaned all the old docker images manually and redeployed the DSA and now the bug seems to be fixed. It's really much more responsive! Next week we will start real-world usage again on a larger scale and will report back if there are any bottlenecks left. Thanks!

manthey commented 1 year ago

I'm closing this issue. There is an optional annotation tracker plugin that does some of the initial request.