Open jeffward01 opened 1 year ago
Graybyte values are saved. The thumbnails of the found duplicates are not saved between multiple scans. This is by design as these thumbnails can take a lot of space. But these thumbnails shouldn't affect scan speed as they're generated after scan is done.
Graybyte values are saved. The thumbnails of the found duplicates are not saved between multiple scans. This is by design as these thumbnails can take a lot of space. But these thumbnails shouldn't affect scan speed as they're generated after scan is done.
Let me do some testing to see, because in my experience if I repeat the above steps - it takes days to complete a scan.
Perhaps I am mixing up some settings ad 're-scanning' the database so that the database id dumped then rescanned.
I will test and verify this, then report back to you on my findings either way 🙌
The thumbnails of the found duplicates are not saved between multiple scans.
For example, I have at least 159,142
video files in my library haha. So this means each time I run a scan, it will need to generate 159,142
* thumbnailCount each time.
kb
? I'm just trying to how large the cache could grow.Honestly tho, it can't be worse than JetBrains cache size 🙃
I have not tried the latest versions of VDF ..., but in principle it has been so far:
Thank you @Maltragor for explaining that, that makes a lot of sense how it is “averaged out” to an even ratio like in the example you gave.
If I made a pull-request, do you think it would be helpful it the algorithm had a “memory” and would make some sort of adjustments to not “re-scan” entires?
the adjustments would be to refactor how it selected where the scans would take place by pre-setting slots essentially.
such as for example, if you have (3) thumbnails, let’s say the thumbnails are at: these positions:
• 5% mark • 50% mark • 75% mark
4 thumbnails: • 5% • 33% mark • 50% mark • 75% mark
2 thumbnails: • 5% mark • 50% mark
1 thumbnail: • 5%
As an example ^^.
I don’t see why it is necessary for it to “by ratio” re-pick the marks based on each number of thumbnails in an “even way”. If it has pre-determined slots like in my example it could be alot faster.
Questions:
1.) if I made a PR for this, would it be something that would be accepted, or does it logically break something?
2.) given the example of (2) 60 minute long movies that are identical, but each movie has a DIFFERENT 10 second exactly intro - in the current algorithm, would a duplicate be detected? My assumption is no, because it would not see the gray pixels in the first 10 seconds. Is this correct?
Issue
Each time a 'scan' occurs and the number of thumbnails change, the thumbnails are not 'stored' or 'remembered'. This results in a very long file scan time.
Please consider the following scenario
Scenerio
Action 1
Action 2
Action 3 (this is the important step)
Action 3 (this is the important step) Alternate version
Expected behavior
Action 3
Action 3 - Alternate Version
Question
Does this functionality exist?
Context --> Why I suggest this feature