cebtenzzre / tumblr-utils

A fork of tumblr-utils with Python 3 support, bug fixes, and lots of features I found useful.
GNU General Public License v3.0
40 stars 8 forks source link

index.html contains a title but is missing the actual index #9

Closed magzanilla closed 1 year ago

magzanilla commented 1 year ago

Have recently been trying out fresh installs of Tumblr-Utils (the latest version: June 2023.) with just the API key added, while trying to check for any issues to help others out that wanted to backup their Tumblr.

In the latest version, can confirm that "no module named 'fcntl'" error appears.

(I use Windows)

image

image

With Python 3.10, the error doesn't appear when just commenting out "import fcntl" in util.py. So at least there's that.

image

image

While fcntl was intended as replacement for deprecated imghdr, as your comment here for the May 16 2023 commit says: https://github.com/Cebtenzzre/tumblr-utils/commit/f89c663d4214d128349102b7b6353f935ae908b6

For Python 3.11, since imghdr is deprecated as a fallback, and fcntl not appear, it would cause some issues.

It ends up skipping image downloads as-is while backing up, and just links to tumblr hosted images, similar to when using the -k command... except not actually using the command. (This ends up happening to someone I was trying to help, who uses Mac OS and first tried with Python 3.11)

The error in question (not mine): without commenting out "import fcntl" image

with commenting out "import fcntl" image

Still, even with commenting out "import fcntl" so that bug doesn't appear, the index page will be blank besides the tumblr title and description.

image image

... Though this also happens on fresh installs (with just API key added) of an older tumblr-utils I had downloaded before. So it doesn't seem related as initially thought.

The same thing of blank index page happens for those am trying to help with their fresh installs, including on MacOS. At least 2 other people.

Meanwhile, my old install (not fresh install) on a different hard drive, using January 2022 tumblr-utils, does not have either issue and can make a proper index page. (the same blog, on same day) image

And have been trying figure out why the blank index page thing happens, but have no idea. When the script runs, these fresh installs (of January 2022, and latest of June 2023) don't flash the message "building index page" before finishing, unlike the old one I use.

Am reporting both because when tried to "fix" one issue, the other popped up right after.

magzanilla commented 1 year ago

Tried to see what differences there were between the fresh install of January 2022 tumblr-utils and the old one, for the Index build bug.

(since the blank index page bug happens in both january 2022 and june 2023 fresh install, thought might as well compare to the most similar. Same Windows machine, same version.)

image

so tumblr_backup.py is identical.

image

util.py also identical

cebtenzzre commented 1 year ago

Could you tell me which steps I can follow to reproduce the missing images issue, and the broken index.html issue? Tell me which version you are seeing each issue on - with the actual commit SHA if you know it. I've never notices the images issue on Linux, and I can't reproduce the index.html issue with the cebtenzzre branch on Arch Linux with Python 3.11.3 or Windows 10 with Python 3.11.4.

And could you please open a separate issue for the SSL CERTIFICATE_VERIFY_FAILED errors? I don't see those on my end, I'd like to help you fix those first if I can.

I'd also like to know:

edit: commit 91348e771c5fccce99e45171f13591fec96b21af should fix the fcntl issue, btw.

magzanilla commented 1 year ago

Missing images backup issue with the SSL CERTIFICATE_VERIFY_FAILED errors, happened on:

Mac OS (was not told which specific) Python 3.11.4 this version of tumblr-utils (July 4, 2023) https://github.com/Cebtenzzre/tumblr-utils/commit/e5d80ffa9d6c60d44b1e119599a64d648f7ccd34

Will have to give more specification of the missing image backup bug later, as it was not on my machine (and the person is not tech savvy).

And would assume would have to put that in the separate issue report since it's the Mac OS specific error version ?

--

Can give details on the blank index page bug, though, as was able to reproduce it on my Windows machine as well after it came to my attention that it was happening.

It happens on Mac OS (1 users), and Windows (2 users). All 3 had blank index page bug.

In my case:

image

https://github.com/Cebtenzzre/tumblr-utils/commit/e5d80ffa9d6c60d44b1e119599a64d648f7ccd34 : Downloaded the new tumblr-util at the time, unzipped it, put it on my default hard drive, added my API key into tumblr_backup.py, and commented out "import fcntl".

https://github.com/Cebtenzzre/tumblr-utils/commit/6ba4c144c874a5a2096d4767ec8bddd91ddc78ae: Unzipped the tumblr-util version that usually use, but unzipped it onto my default hard drive instead of my external hard drive where it works, and added my API key.

The one that works and has no issues also uses the same version as above on the external hard drive, same API key, same windows machine.

Ran default command "py tumblr_backup.py (blog name here)" for all of them first.

My default hard drive had not run tumblr-utils before this, whereas the external hard drive had done so before and still runs well (very confusing).

Edit: linked the same commit for all at first.

cebtenzzre commented 1 year ago

This thread is somewhat confusing with several different issues across several different operating systems. I have a feeling the missing images bug and the SSL CERTIFICATE_VERIFY_FAILED issue are related, now that I know it's on macOS it makes a little more sense. I will track the index.html issue here.

Btw, the macOS user shouldn't be commenting out the import fcntl line, it's actually necessary on that platform. It's just not available on Windows.

magzanilla commented 1 year ago

image

Of course, changing to external hard drive (where there are no bugs) to check Python information, shows the same thing.

No custom python install shenanigans there far as am aware ?

Genuinely quite confused.

Not sure what difference is going on to be able to recreate what has been happening for others.

magzanilla commented 1 year ago

This thread is somewhat confusing with several different issues across several different operating systems. I have a feeling the missing images bug and the SSL CERTIFICATE_VERIFY_FAILED issue are related, now that I know it's on macOS it makes a little more sense. I will track the index.html issue here.

Btw, the macOS user shouldn't be commenting out the import fcntl line, it's actually necessary on that platform. It's just not available on Windows.

Noted.

Edit: (though the issue was that on Mac OS there was a missing module and it fell back to using imghdr just like on Windows. but that's a topic for when I open that issue separately later)

cebtenzzre commented 1 year ago

That combination of python, pip, and urllib3 versions immediately threw an exception when I tested it. Do you also have requests installed? I'd like to know the output of:

python -m pip show requests | findstr /c:Version
magzanilla commented 1 year ago

That combination of python, pip, and urllib3 versions immediately threw an exception when I tested it. Do you also have requests installed? I'd like to know the output of:

python -m pip show requests | findstr /c:Version

output: image

also, in case image

cebtenzzre commented 1 year ago

Oh, if you're using py instead of python to run tumblr_backup.py, then you'll have to rerun those commands with py instead:

py --version
py -m pip --version
py -m pip show urllib3 | findstr /c:Version
py -m pip show requests | findstr /c:Version

Sorry about that, I develop on Linux and I'm not really familiar with the 'py' launcher.

magzanilla commented 1 year ago

image

result

cebtenzzre commented 1 year ago

I haven't been able to reproduce the issue on Windows 10 with Python 3.11.4, pip 23.1.2, and no urllib3 or requests installed. I used -O to save to three different hard drives - one SSD and two spinning HDDs.

I'd like you to add two debugging lines to tumblr_backup.py, like this:

diff --git a/tumblr_backup.py b/tumblr_backup.py
index d9fb4ea..58e19e8 100755
--- a/tumblr_backup.py
+++ b/tumblr_backup.py
@@ -595,6 +595,7 @@ class Index:
                 idx.write('<p><a href={}>Tag index</a></p>\n'.format(
                     urlpathjoin(tag_index_dir, dir_index)
                 ))
+            print(f'writing {len(self.index)} years to index')
             for year in sorted(self.index.keys(), reverse=options.reverse_index):
                 self.save_year(idx, archives, index_dir, year)
             idx.write('<footer><p>Generated on %s by <a href=https://github.com/'
@@ -680,6 +681,7 @@ class Indices:
     def build_index(self):
         filter_ = join('*', dir_index) if options.dirs else '*' + post_ext
         for post in (LocalPost(f) for f in glob(path_to(post_dir, filter_))):
+            print(f'adding post {post.ident} to index')
             self.main_index.add_post(post)
             if options.tag_index:
                 for tag, name in post.tags:

And then run tumblr_backup.py on the hard drive that causes the problem. I'd like to know the exact command you are running, and the full console output from the failed run, as well as a copy of the broken index.html that it generates. An example command you could use is: py tumblr_backup.py -n 10 just-art

magzanilla commented 1 year ago

(first with default py 3.11, second with python 3.10) image

py tumblr_backup.py -n 10 just-art

image py -3.10 tumblr_backup.py -n 10 mustlovegarlic

image

image

all content of just-art index.html (using python 3.11):

<!DOCTYPE html>

<meta charset=utf-8>
<title>Art, just art</title>
<link rel=stylesheet href=backup.css>

<body class=index>

<header>
<h1>Art, just art</h1>
<p class=subtitle>A blog for share inspiration and promote artists.</p>
</header>
<footer><p>Generated on 7/14/2023 8:23:51 PM by <a href=https://github.com/bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>

all content of mustlovegarlic index.html (using python 3.10):

<!DOCTYPE html>

<meta charset=utf-8>
<title>Must Love Garlic| Good Food &amp; RV Living</title>
<link rel=stylesheet href=backup.css>

<body class=index>

<header>
<h1>Must Love Garlic| Good Food &amp; RV Living</h1>
<p class=subtitle><p>Welcome to Must Love Garlic: a food and travel blog. Here you'll find easy, step-by-step comforting recipes and sprinkles of travel inspiration: must-love-garlic.com</p></p>
</header>
<footer><p>Generated on 7/14/2023 8:13:41 PM by <a href=https://github.com/bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>

edit: formatting issues

cebtenzzre commented 1 year ago

Sorry for the delay. I'd like to get to the bottom of this.

Assuming you've kept those two added lines, could you try applying this change as well?

diff --git a/tumblr_backup.py b/tumblr_backup.py
index fa4af47..1609a17 100755
--- a/tumblr_backup.py
+++ b/tumblr_backup.py
@@ -680,7 +680,17 @@ class Indices:

     def build_index(self):
         filter_ = join('*', dir_index) if options.dirs else '*' + post_ext
-        for post in (LocalPost(f) for f in glob(path_to(post_dir, filter_))):
+        glob_path = path_to(post_dir, filter_)
+        print(f'filter_={filter_!r} glob_path={glob_path!r}')
+        glob_res = glob(glob_path)
+        try:
+            with os.scandir(os.path.dirname(glob_path)) as it:
+                for e in it:
+                    print(f'dir listing: have {e.name!r}')
+        except OSError as e:
+            print(f'scandir failed with {e!r}')
+        print(f'glob_res={glob_res}')
+        for post in (LocalPost(f) for f in glob_res):
             print(f'adding post {post.ident} to index')
             self.main_index.add_post(post)
             if options.tag_index:

The output looks like this on my functioning install:

$ ./tumblr_backup.py -n 10 just-art -O /tmp/just-art
just-art: Stopping backup: Reached limit of 10 posts                            
filter_='*.html' glob_path='/tmp/just-art/posts/*.html'                         
dir listing: have '721966095332933632.html'
dir listing: have '185562966917.html'
dir listing: have '180787801522.html'
dir listing: have '185367500862.html'
dir listing: have '719436193408892928.html'
dir listing: have '719974155632738304.html'
dir listing: have '185359556132.html'
dir listing: have '185342871967.html'
dir listing: have '182755962637.html'
dir listing: have '185433472537.html'
glob_res=['/tmp/just-art/posts/721966095332933632.html', '/tmp/just-art/posts/185562966917.html', '/tmp/just-art/posts/180787801522.html', '/tmp/just-art/posts/185367500862.html', '/tmp/just-art/posts/719436193408892928.html', '/tmp/just-art/posts/719974155632738304.html', '/tmp/just-art/posts/185359556132.html', '/tmp/just-art/posts/185342871967.html', '/tmp/just-art/posts/182755962637.html', '/tmp/just-art/posts/185433472537.html']
adding post 721966095332933632 to index
adding post 185562966917 to index
adding post 180787801522 to index
adding post 185367500862 to index
adding post 719436193408892928 to index
adding post 719974155632738304 to index
adding post 185359556132 to index
adding post 185342871967 to index
adding post 182755962637 to index
adding post 185433472537 to index
writing 3 years to index
just-art: 10 posts backed up
magzanilla commented 1 year ago

py -3.10 tumblr_backup.py -n 15 mustlovegarlic

image

mustlovegarlic: Stopping backup: Reached limit of 15 posts
filter_='*.html' glob_path='C:\\Users\\crist\\Downloads\\[04] Programs\\[01] Apps\\[01] To Do Archiving\\tumblr-utils-7-2023\\mustlovegarlic\\posts\\*.html'
dir listing: have '701086876415787008.html'
dir listing: have '701539364955684864.html'
dir listing: have '703121534940807168.html'
dir listing: have '703121751113138176.html'
dir listing: have '704079125106851840.html'
dir listing: have '706172122872102912.html'
dir listing: have '706615090021105664.html'
dir listing: have '707436522077650944.html'
dir listing: have '707885362415058944.html'
dir listing: have '708476421443567616.html'
dir listing: have '708520969779675136.html'
dir listing: have '708695782566641664.html'
dir listing: have '709449757472522240.html'
dir listing: have '709511138464989184.html'
dir listing: have '709782865557258240.html'
glob_res=[]
writing 0 years to index
mustlovegarlic: 15 posts backed up

edit: added copy pasted output text, as well

cebtenzzre commented 1 year ago

Oh... it's the square brackets! I can fix that for you. glob() is a bad way to do this kind of thing, I'll replace that with scandir so it's more robust.

magzanilla commented 1 year ago

Can confirm, it fixed the Index bug, no other additional cause.

image

Thank you for the persistence in trying to help.

(note: The Mac OS bug ticket opening highly depends on the availability of the Mac OS user ... otherwise might try to replicate from the information and come back - but that would take a long while)