STRRL / dejavu

With Dejavu, you can have a perfect memory by capturing and organizing your visual recordings efficiently.
MIT License
135 stars 2 forks source link

Improving archiver space efficiency #3

Open AsterNighT opened 1 year ago

AsterNighT commented 1 year ago

The idea of project is quite interesting. I haven't really tested it for long but it seems promising. The greatest drawback now seems to be the archiver storage consumption. It only takes like 3 minites to produce 100MB screenshots.

The screenshots seem to contain lots of duplications. It would be good if there would be a filter before an image is ever archived. A first idea to me is to check for duplicated images with hashes. It is also possible to "rank" the images based on the text extracted from it, but this would require careful research and design.

STRRL commented 1 year ago

The most mainstream video codec (like h.265 and others) could reduce storage usage by just encoding the diff parts between frames. So I think this goal could be completed by the future archiver implemented by the video, which is also in the ROADMAP.

What do you think about it?

STRRL commented 1 year ago

But that's a lot of tuning options on video encoding process, maybe the early implementation is not the best one(mostly perhaps) ... 🫣

STRRL commented 1 year ago

I just tried it on my Linux machine:

STRRL commented 1 year ago

with the default h265 encoding, the video would take 4 MiB for every 10 minutes.

STRRL commented 1 year ago

so it would take about 200~ MiB for 8 hrs daily usage, 1.4~ GiB for weekly usage.

STRRL commented 1 year ago

I think it's close enough to the performance to Rewind on macOS

image
STRRL commented 1 year ago

the file size of the video always depends on the content, so it would increase with complex content, but I think it would NOT exceed of 10x more spaces.

I think it's kind of enough to use for now. 🤩

What do you think about it? @AsterNighT

AsterNighT commented 1 year ago

That makes sense. Actually I haven't heard of rewind before. The way Dejavu runs now use like 15% of my cpu time (laptop, 6800H, mostly tesseract, I suppose.) And I think we would need something like a live streaming encoder. Not sure how much extra cpu it would take.

AsterNighT commented 1 year ago

I tried a few seemingly viable way for screen capturing and video encoding.

  1. Call ffmpeg directly. It works, and the overhead is minimum. ffmpeg should be cross-platform but the arguments are not. And it does not provide an interface for processing the frames.
  2. Capture screens and feed them to https://github.com/ralfbiedert/openh264-rs. This does not seem to support a frame-by-frame encoding (or it is supported by raw APIs, the documentation is limited). The document claims that it is cross-platform. Haven't tried it personally.
  3. https://github.com/astraw/vpx-encode gives an example of encoding with libvpx. From the code it supports frame-by-frame encoding. But it neither builds on my windows nor linux.
STRRL commented 1 year ago

I dive into the detail about what rewind does..

A first idea to me is to check for duplicated images with hashes. It is also possible to "rank" the images based on the text extracted from it, but this would require careful research and design.

it really does the same thing: when there are no lots of changes in the content, it drops some picture, fallback to 1 images per 20s~. When there are lost of changes on the screen, it would use the 0.5 fps.

STRRL commented 1 year ago

There are lots of algorithms for image similarity detection.. I have been lost in them.

maybe I would make a simple one(Histogram comparison) and a heavy one(opencv).. and make it extensible

AsterNighT commented 1 year ago

I'm not sure but would manually detecting image similarity outperform encoding it with some encoding algorithm? It will be more tunable indeed. While would it not be a more accessible way, if you would like to compress texts, to use some compressing algorithm than manually detecting text similarity and deduplicate them?

AsterNighT commented 1 year ago

There are lots of algorithms for image similarity detection.. I have been lost in them.

maybe I would make a simple one(Histogram comparison) and a heavy one(opencv).. and make it extensible

I don't think simple algorithms like Histogram would be of great effect, consider this: You are reading a very long markdown article on, say, github. There will be load of texts and clearly you would like these texts to be recorded. But the histogram of the article will stay almost the same (after all they are texts only, in this sense they are "similar").

Or rather, maybe the filtering should be done after tesseract, not before. Since after all it is the texts that are searched with, not the image itself.

I'm thinking something like "Retaining the word set of the most recently captured X screenshots and calculate the similarity between the current pic and the set". Never done such thing before so I'm not sure if it works.