bakape / hydron

media tagger and organizer
GNU Lesser General Public License v3.0
95 stars 9 forks source link

Watching directories for new imports #23

Closed Chiiruno closed 6 years ago

Chiiruno commented 6 years ago

Probably just run a hash check and if new/altered, import the image and remove the replaced image, if applicable.

bakape commented 6 years ago

What?

Chiiruno commented 6 years ago

Being able to have access to new or altered images without having to reimport the folder each time would be nice.

bakape commented 6 years ago

But images in the database are immutable. Only tags mutate.

On 11 July 2018 at 22:58, チルノ notifications@github.com wrote:

Being able to have access to new or altered images without having to reimport the folder each time would be nice.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404291482, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsG_C79cpMRBBVsY0n1SDtvq0VUO0ks5uFljWgaJpZM4VLerC .

Chiiruno commented 6 years ago

Yes, but when I change or add something in /home/okina/Pictures, I want hydron to reflect that either immediately or every start of hydron, without having to import the entire folder each time.

bakape commented 6 years ago

Importing an image performs a copy. It is in no way tied to your personal image storage directories.

On 11 July 2018 at 23:01, チルノ notifications@github.com wrote:

Yes, but when I change or add something in /home/okina/Pictures, I want hydron to reflect that either immediately or every start of hydron, without having to import the entire folder each time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404292261, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsEEwuA_rXdpNGuhgA75R-bNZ6FTJks5uFll8gaJpZM4VLerC .

Chiiruno commented 6 years ago

Yes, I know. I want hydron to find out if the local copy differs from the original copy, and if it does, replace the local with the (new) "original".

bakape commented 6 years ago

And how do you think that would ever be possible without you turning your personal folders over to VCS?

On 11 July 2018 at 23:05, チルノ notifications@github.com wrote:

Yes, I know. I want hydron to find out if the local copy differs from the original copy, and if it does, replace the local with the original.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404293454, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsNQUQsZ5Qjc-5UhRlncijLapjb8Oks5uFlqWgaJpZM4VLerC .

Chiiruno commented 6 years ago

Keep a database of hashes which you already do, and check if the hash is the same. If not, replace/add to the local copy.

bakape commented 6 years ago

No, you don't understand. I can check, if an image is in the database already, just fine, but how can I check, if image A with hash B was previously image C with hash D?

On 11 July 2018 at 23:23, チルノ notifications@github.com wrote:

Keep a database of hashes which your already do, and check if the hash is the same. If not, replace/add to the local copy.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404298103, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsIW4XFIo74g8LqG0LLq32Waj5AAMks5uFl60gaJpZM4VLerC .

Chiiruno commented 6 years ago

I can't say I know. All I know is that being able to not have to import every time I start up hydron would be nice, since I have a large image folder.

bakape commented 6 years ago

being able to not have to import every time I start up hydron would be nice And you don't have to. Import is a one time thing.

On 11 July 2018 at 23:32, チルノ notifications@github.com wrote:

I can't say I know. All I know is that being able to not have to import every time I start up hydron would be nice, since I have a large image folder.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404300361, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsLBnDharT-0zXKfhzAM7M7m_Ew7Pks5uFmDHgaJpZM4VLerC .

Chiiruno commented 6 years ago

I don't have to, sure. But if I want the two images I just added to my Pictures folder, I have to re-run it, in addition to fetch_tags, which takes a while too.

bakape commented 6 years ago

Then just add them separately with the -f flag. Otherwise you will have to rehash every file in the directory to check for matches.

On 12 July 2018 at 01:05, チルノ notifications@github.com wrote:

I don't have to, sure. But if I want the two images I just added to my Pictures folder, I have to re-run it, in addition to fetch_tags, which takes a while too.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404325544, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsPaGXNQRk75bB4Xa-vRv_0q2CbF4ks5uFna3gaJpZM4VLerC .

Chiiruno commented 6 years ago

Having to have a separate directory for newly saved images is silly just so I could do that, waaay too much maintenance. There has to be a smart way to at the very least, find new images and fetch their tags without reimporting.

bakape commented 6 years ago

Import already does that. The problem is it still has to read and hash every file in your directory tree to check for matches.

On 12 July 2018 at 01:10, チルノ notifications@github.com wrote:

Having to have a separate directory for newly saved images is silly just so I could do that, waaay too much maintenance. There has to be a smart way to at the very least, find new images and fetch their tags without reimporting.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404326652, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsDzVBCUIg99J8xUnkUlopx8u1dbCks5uFnfMgaJpZM4VLerC .

Chiiruno commented 6 years ago

How about this then, a possible middleground between performance and ease-of-use. A slow import and fetch that's always running, so it doesn't bring the system to a hault or otherwise take up too much CPU. Or, a slightly faster import and fetch that happens every X amount of time. That way, we don't have to run import to update for new/altered images and fetch_tags for tags, since both will either always be running or being ran every increment of time.

bakape commented 6 years ago

http://man7.org/linux/man-pages/man7/inotify.7.html and https://github.com/fsnotify/fsnotify. Want to have a jab at this?

On 12 July 2018 at 01:15, チルノ notifications@github.com wrote:

How about this then, a possible middleground between performance and ease-of-use. A slow import and fetch that's always running, so it doesn't bring the system to a hault or otherwise take up too much CPU. Or, a slightly faster import and fetch that happens every X amount of time. That way, we don't have to run import to update for new/altered images and fetch_tags for tags, since both will either always be running or being ran every increment of time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-404327695, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsBQRrt2l7LzqpZ1ec2KS-bSvsJZwks5uFnjpgaJpZM4VLerC .

Chiiruno commented 6 years ago

Sure, but it might take me a while to get to.

bakape commented 6 years ago

It's fine. I don't consider this core functionality anyway.

Chiiruno commented 6 years ago

@bakape Could you give me collaborator permissions on this repo too so I can make branches here like meguca? I changed up my local stuff to make branches on meguca directly, so I'd like to do it on hydron too for whenever I get to this.

bakape commented 6 years ago

Done.

On 2 August 2018 at 23:38, チルノ notifications@github.com wrote:

@bakape https://github.com/bakape Could you give me collaborator permissions on this repo too so I can make branches here like meguca? I changed up my local stuff to make branches on meguca directly, so I'd like to do it on hydron too for whenever I get to this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bakape/hydron/issues/23#issuecomment-410060356, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsKK0Y-ghSKZscEVesNIkkeeDpCMFks5uM2MugaJpZM4VLerC .

Chiiruno commented 6 years ago

Watching directories with hydron may be not only a problem for hydron itself, but also the entire OS. Read the bottom of https://github.com/fsnotify/fsnotify under "How many files can be watched at once?"

There are OS-specific limits as to how many watches can be created:

Linux: /proc/sys/fs/inotify/max_user_watches contains the limit, reaching this limit results in a "no space left on device" error. BSD / OSX: sysctl variables "kern.maxfiles" and "kern.maxfilesperproc", reaching these limits results in a "too many open files" error.

Chiiruno commented 6 years ago

So, just importing each time you want to add new files and trying our hardest to optimize and even skip suspected already imported files may be for the best. Thoughts?

Chiiruno commented 6 years ago

Also I know you said it was dumb, but BLAKE2 might be a good way for faster and more unique hashes, if you ever want to consider that. https://research.kudelskisecurity.com/2017/03/06/why-replace-sha-1-with-blake2/ https://godoc.org/golang.org/x/crypto/blake2b

bakape commented 6 years ago

I have reduced memory usage for already imported files with d5ca369e8fd53e7dbd414fd5b4f738cf7530ec8a.

BLAKE2

Not an option. Whatever slow gain would be offset by the overhead of still needing SHA1 hashes and storing an extra hash per image.

Basically, don't rescan your image folders all the fucking time. That is not how hydron was intended to be used.

Chiiruno commented 6 years ago

Not an option. Whatever slow gain would be offset by the overhead of still needing SHA1 hashes and storing an extra hash per image.

Now it makes sense why you think it's dumb, does this mean you can't put BLAKE2 hashes into the DB the same way you can SHA1? Why would you need an extra hash? I'm well aware it would require rewriting the thumbnailer and other stuff.

bakape commented 6 years ago

Because we still need to generate SHA1, because external services use SHA1. Same with MD5. So currently we generate SHA1+MD5. With BLAKE2 we would need to generate BLAKE2+SHA1+MD5 and store the BLAKE2 hash as well. At the same time no external service I know of uses BLAKE2, so it's not reusable.

Chiiruno commented 6 years ago

Okay, thank you for explaining this to me. BLAKE2 isn't an optimization option, at least not for the foreseeable future.

Chiiruno commented 6 years ago

Since you closed #32 , should we close this one too? AFAIK using fsnotify isn't an option because of the file limit.