Psycojoker / prosopopee

a static website generator to make beautiful customizable pictures galleries that tell a story
http://prosopopee.readthedocs.org
GNU General Public License v3.0
318 stars 55 forks source link

autogen not working with my files #130

Open crypto512 opened 3 years ago

crypto512 commented 3 years ago

I've started using prosopopee today and I love the result but I've found some issues (this one and space in filename/dir issue).

autogen is generating random sequence so i've tried to understand why. Regarding EXIF my issue was that the tag for the Date is DateTimeOriginal and not DateTime. It also seems that exif[tag] is not a date when printed.

I've also used the new function getexif instead of _getexif.

Why using last modification instead of creation if there is no EXIF ? Maybe sometimes the filename is better in that case.

as I'm not a python programmer dont assume the code is done the right way but it works:

--- a/prosopopee/autogen.py
+++ b/prosopopee/autogen.py
@@ -36,14 +36,13 @@ types = ("*.JPG", "*.jpg", "*.JPEG", "*.jpeg", "*.png", "*.PNG")

 def get_exif(filename):
-    exif = Image.open(filename)._getexif()
+    exif = Image.open(filename).getexif()
     if exif is not None:
-        for tag in exif:
-            decoded = TAGS.get(tag, tag)
-            if decoded == "DateTime":
-                return exif[tag]
+        ctime = exif.get(36867)
+        if ctime is not None:
+            return ctime

-    return strftime("%Y:%m:%d %H:%M:00", gmtime(os.path.getmtime(filename)))
+    return strftime("%Y:%m:%d %H:%M:00", gmtime(os.path.getctime(filename)))

 def build_template(folder, force):

any advice ?

QSchulz commented 3 years ago

Hi,

The _getexif() function predates the getexif() one. Though, it's been in Pillow since release 6.0.0 (c.f. introducing commit: https://github.com/python-pillow/Pillow/commit/d5db62be7b71b2d3b5ed9e0e6e4181c5a93085f9). We could probably use the getexif() function then. We'll probably want to force Pillow to be >=6.0.0 in requirements.txt then. Which IMO is anyway not a bad idea :)

It also seems that exif[tag] is not a date when printed.

It'd help us if you could tell us exactly what this returns for you instead of "not a date".

For the EXIF tag being DateTimeOriginal for you, we probably don't want to replace the DateTime we have here as EXIF are widely different between cameras. Though we could definitely add support for DateTimeOriginal, so that if one of (DateTimeOriginal, DateTime) is found, its value is taken. Without thinking too much about it, I think it would make sense.

I'm also surprised there would be a need for checking if the value of an EXIF tag is not None since in my mind, the tag would just not exist? Did something make you add this check? Pretty sure by the way that your exif.get(tag) has the same behavior as exif[tag] as that's what Python does for dicts.

For ctime over mtime, no hard feeling on this.

Note, I'm just a contributor.

crypto512 commented 3 years ago

I've checked what is returned in the current code and it seems fine.

For the EXIF if I read correctly, DateTimeOriginal seems to be the one to choose first: https://www.media.mit.edu/pia/Research/deepview/exif.html

Regarding file time after checking with my files, it seems to be counterproductive to use creation time or modification time and maybe file name ordering will give better results (as they are normally sorted alphabetically by the camera)...

As said I'm not a python programmer so take my code like that :)

So my current patch to fix it (it also fix my space issue in file path):

diff --git a/prosopopee/autogen.py b/prosopopee/autogen.py
index 8353ae7..6a481b3 100644
--- a/prosopopee/autogen.py
+++ b/prosopopee/autogen.py
@@ -1,6 +1,7 @@
 import logging
 import os
 import sys
+import ntpath
 from time import gmtime, strftime
 from glob import glob
 from jinja2 import Template
@@ -35,15 +36,17 @@ sections:
 types = ("*.JPG", "*.jpg", "*.JPEG", "*.jpeg", "*.png", "*.PNG")

-def get_exif(filename):
-    exif = Image.open(filename)._getexif()
+def get_sort_key(filename):
+    key = ntpath.basename(filename);
+    exif = Image.open(filename).getexif()
     if exif is not None:
-        for tag in exif:
-            decoded = TAGS.get(tag, tag)
-            if decoded == "DateTime":
-                return exif[tag]
+        # DateTimeOriginal, DateTimeDigitized, DateTime(DateTimeModified)
+        ctime = exif.get(0x9003, exif.get(0x9004, exif.get(0x0132)))
+        if ctime is not None:
+            key = ctime

-    return strftime("%Y:%m:%d %H:%M:00", gmtime(os.path.getmtime(filename)))
+    logging.info("Use key='%s' for '%s'", key, filename)
+    return key

 def build_template(folder, force):
@@ -74,7 +77,7 @@ def build_template(folder, force):
         title=gallery_settings["title"],
         date=gallery_settings["date"],
         cover=gallery_settings["cover"],
-        files=sorted(files_grabbed, key=get_exif),
+        files=sorted(files_grabbed, key=get_sort_key),
     )
     settings = open(Path(".").joinpath(folder, "settings.yaml").abspath(), "w")
     settings.write(msg)
diff --git a/prosopopee/prosopopee.py b/prosopopee/prosopopee.py
index 1b8fadd..9322052 100644
--- a/prosopopee/prosopopee.py
+++ b/prosopopee/prosopopee.py
@@ -384,8 +384,12 @@ class Image:

     @property
     def ratio(self):
-        command = "gm identify -format %w,%h " + self.base_dir.joinpath(self.name)
-        out = subprocess.check_output(command.split())
+        command = "gm identify -format %w,%h"
+        command_list = command.split()
+        # target is Type path.Path, encodes to bytes, decodes to str, which we can append to the
+        # list disgusting, I know. But it works
+        command_list.append(self.base_dir.joinpath(self.name).encode().decode())
+        out = subprocess.check_output(command_list)
         width, height = out.decode("utf-8").split(",")
         return float(width) / int(height)
QSchulz commented 3 years ago

Disagree on filename being used for sorting, ctime/mtime is fine IMO. Sorting on filename works when you have only one camera for one album, otherwise it won't make much sense, or that you don't rename your pictures. Although... it's very unlikely that the camera internal clocks are synchronized anyway so for pictures of the same event more or less at the same time won't be ordered.

Not sure that your Image.open() actually needs an escaped path, I tested manually with test 123/test 345.png file in a dummy Python script on Linux and it seems to work just fine. Since you're importing ntpath, I guess you're on Windows? Is this actually an issue? If it is, can you try to just put Path(filename) instead of filename in Image.open()?

Re-read some "articles" and it seems you're right, we should probably use DateTimeOriginal since this is meant to be a read-only EXIF value, for the moment the picture was taken from the camera POV. DateTime is modified by some picture editing software (e.g. https://mail.gnome.org/archives/f-spot-list/2005-August/msg00081.html) and we probably don't care when the picture was edited. C.f. https://feedback.photoshop.com/conversations/lightroom-classic/date-time-digitized-and-date-time-differ-from-date-modified-and-date-created/5f5f45ba4b561a3d425c6f77 and https://www.quora.com/What-is-the-difference-between-Date-and-Time-Original-and-Date-and-Time-in-the-EXIF-data-output for other explanations.

Good find for the ratio() function, but a better approach IMO would probably be the following:

     @property
     def ratio(self):
-        command = "gm identify -format %w,%h " + self.base_dir.joinpath(self.name)
+       command = "gm identify -format %w,%h '{0}'".format(self.base_dir.joinpath(self.name))

Can you try? This "escapes" the filename with single quotes when passed to the shell used to run gm identify.

crypto512 commented 3 years ago

Disagree on filename being used for sorting, ctime/mtime is fine IMO. Sorting on filename works when you have only one camera for one album, otherwise it won't make much sense, or that you don't rename your pictures. Although... it's very unlikely that the camera internal clocks are synchronized anyway so for pictures of the same event more or less at the same time won't be ordered.

I agree, It works fine if you have only one camera AND no EXIF data. This is not a clock sync issue just that most camera name the file XYZ00001 XYZ00002 ... As the main case is EXIF data i've removed it.

Not sure that your Image.open() actually needs an escaped path, I tested manually with test 123/test 345.png file in a dummy Python script on Linux and it seems to work just fine. Since you're importing ntpath, I guess you're on Windows? Is this actually an issue? If it is, can you try to just put Path(filename) instead of filename in Image.open()?

npath is there just to retrieve the basename not for the open. If there is no sort per filename there is no need for that.

Re-read some "articles" and it seems you're right, we should probably use DateTimeOriginal since this is meant to be a read-only EXIF value, for the moment the picture was taken from the camera POV. DateTime is modified by some picture editing software (e.g. https://mail.gnome.org/archives/f-spot-list/2005-August/msg00081.html) and we probably don't care when the picture was edited. C.f. https://feedback.photoshop.com/conversations/lightroom-classic/date-time-digitized-and-date-time-differ-from-date-modified-and-date-created/5f5f45ba4b561a3d425c6f77 and https://www.quora.com/What-is-the-difference-between-Date-and-Time-Original-and-Date-and-Time-in-the-EXIF-data-output for other explanations.

Good find for the ratio() function, but a better approach IMO would probably be the following:

     @property
     def ratio(self):
-        command = "gm identify -format %w,%h " + self.base_dir.joinpath(self.name)
+       command = "gm identify -format %w,%h '{0}'".format(self.base_dir.joinpath(self.name))

I've copy pasted previous call like this in the file. From my understanding the issue is with the python split string that will not respect any quote.

I've found an additional issue where the link from main page to gallery is not followed by a "/" so my static site is not working.

Thanks for your feedback.

So new patch:

diff --git a/prosopopee/autogen.py b/prosopopee/autogen.py
index 8353ae7..6dbe9dc 100644
--- a/prosopopee/autogen.py
+++ b/prosopopee/autogen.py
@@ -36,12 +36,12 @@ types = ("*.JPG", "*.jpg", "*.JPEG", "*.jpeg", "*.png", "*.PNG")

 def get_exif(filename):
-    exif = Image.open(filename)._getexif()
+    exif = Image.open(filename).getexif()
     if exif is not None:
-        for tag in exif:
-            decoded = TAGS.get(tag, tag)
-            if decoded == "DateTime":
-                return exif[tag]
+        # DateTimeOriginal, DateTimeDigitized, DateTime(DateTimeModified)
+        ctime = exif.get(0x9003, exif.get(0x9004, exif.get(0x0132)))
+        if ctime is not None:
+            return ctime

     return strftime("%Y:%m:%d %H:%M:00", gmtime(os.path.getmtime(filename)))

diff --git a/prosopopee/prosopopee.py b/prosopopee/prosopopee.py
index 1b8fadd..2f3ed8e 100644
--- a/prosopopee/prosopopee.py
+++ b/prosopopee/prosopopee.py
@@ -384,8 +384,12 @@ class Image:

     @property
     def ratio(self):
-        command = "gm identify -format %w,%h " + self.base_dir.joinpath(self.name)
-        out = subprocess.check_output(command.split())
+        command = "gm identify -format %w,%h"
+        command_list = command.split()
+        # target is Type path.Path, encodes to bytes, decodes to str, which we can append to the
+        # list disgusting, I know. But it works
+        command_list.append(self.base_dir.joinpath(self.name).encode().decode())
+        out = subprocess.check_output(command_list)
         width, height = out.decode("utf-8").split(",")
         return float(width) / int(height)

@@ -609,7 +613,7 @@ def create_cover(gallery_name, gallery_settings, gallery_path):

     gallery_cover = {
         "title": gallery_settings["title"],
-        "link": gallery_name,
+        "link": gallery_name + "/",
         "sub_title": gallery_settings.get("sub_title", ""),
         "date": gallery_settings.get("date", ""),
         "tags": gallery_settings.get("tags", ""),
QSchulz commented 3 years ago

The missing / is probably just an issue with your webserver. Nobody had that issue til now (I use Caddy, dead simple, works OOTB) and I use prosopopee for a few years already. Maybe it's related to a theme you're using? Can you let us know which one you're using?

Indeed, forgot about the split. Instead of encode().decode() one can use str() on a pathlib.Path object. Can you try this:

     @property
     def ratio(self):
-        command = "gm identify -format %w,%h " + self.base_dir.joinpath(self.name)
-        out = subprocess.check_output(command.split())
+        command = "gm identify -format %w,%h"
+        command_list = command.split()
+        command_list.append(str(self.base_dir.joinpath(self.name)))
+        out = subprocess.check_output(command_list)
         width, height = out.decode("utf-8").split(",")
         return float(width) / int(height)

The clock sync issue happens if you have EXIF data and 2+ cameras. Anyway, nothing we can do about it.

Can you create a commit per change (one for DateTime, and one for ratio in autogen) and send a Pull Request? The maintainer will have a look when they can.

Thanks for the debugging session 👍

crypto512 commented 3 years ago

ok thanks. will do.

regarding the link this is the default theme with autogen (so very basic). If I upload to pCloud as public site the main menu link is not working.

QSchulz commented 3 years ago

This issue can be closed :)