algoo / preview-generator

generates previews of files with cache management
https://pypi.org/project/preview-generator/
MIT License
228 stars 50 forks source link

[METATICKET] Rethinking preview_generator behavior. #159

Open inkhey opened 4 years ago

inkhey commented 4 years ago

Current preview_generator code have many limitations:

Architecture

Features

So, we need to think about a refactor of preview_generator to solve all or part of theses problems.

inkhey commented 4 years ago

@JulienPalard Did you have any other suggestion about future preview_generator ?

JulienPalard commented 4 years ago

no, LGTM.

inkhey commented 4 years ago

Some thought about mimetype issue:

JulienPalard commented 4 years ago

Best approach to guess mimetype seems to rely on XDG xml database

I don't trust the filename. Also relying on the filename won't work in cases we want to expose a "file-like" API in the future, like:

with open("some-file.jpg", "rb") as f:
    manager.get_jpg_preview(f)

why not relying on file content instead, à la libmagic?

inkhey commented 4 years ago

Best approach to guess mimetype seems to rely on XDG xml database

I don't trust the filename.

see https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html#idm45070737710448 Not sure we should follow exactly same pattern but i do think the approach is nice to handle most case.

Keep in mind that XDG xml database means:

As XDG cover most of the needed feature (magic, filename/extension -> mimetype), i do think it should be good enough for a first approach as the only mecanism to get mimetype. We may think about adding python magic somewhere if we see some drawbacks.

Also relying on the filename won't work in cases we want to expose a "file-like" API in the future, like:

with open("some-file.jpg", "rb") as f:
    manager.get_jpg_preview(f)

why not relying on file content instead, à la libmagic?

Nice idea, we can support this using XDG too, we should probably tweak https://pyxdg.readthedocs.io/en/latest/_modules/xdg/Mime.html#get_type2 method to be able to match this case too using https://pyxdg.readthedocs.io/en/latest/mime.html#xdg.Mime.get_type_by_data.

Another possibility is to have multiple different approach for mimetype guesser:

a523 commented 2 years ago

I'll say what I think:

Shouldn't we automatically detect all builders are in the project and load them all. Should we allow the user to customize which builders are enabled or disabled, like the django installed apps? Of course We can provide the option of the default enabled builders. This way there are no errors from unwanted builders, and it has the advantage that if there are multiple builders working on the same mime-type of file. The user would be free to choose which builder to enable, without requiring us to modify the library code itself.