Closed magzanilla closed 1 year ago
Tried to see what differences there were between the fresh install of January 2022 tumblr-utils and the old one, for the Index build bug.
(since the blank index page bug happens in both january 2022 and june 2023 fresh install, thought might as well compare to the most similar. Same Windows machine, same version.)
so tumblr_backup.py is identical.
util.py also identical
Could you tell me which steps I can follow to reproduce the missing images issue, and the broken index.html issue? Tell me which version you are seeing each issue on - with the actual commit SHA if you know it. I've never notices the images issue on Linux, and I can't reproduce the index.html issue with the cebtenzzre branch on Arch Linux with Python 3.11.3 or Windows 10 with Python 3.11.4.
And could you please open a separate issue for the SSL CERTIFICATE_VERIFY_FAILED errors? I don't see those on my end, I'd like to help you fix those first if I can.
I'd also like to know:
python --version
python -m pip show pip | findstr /c:Version
python -m pip show urllib3 | findstr /c:Version
edit: commit 91348e771c5fccce99e45171f13591fec96b21af should fix the fcntl issue, btw.
Missing images backup issue with the SSL CERTIFICATE_VERIFY_FAILED errors, happened on:
Mac OS (was not told which specific) Python 3.11.4 this version of tumblr-utils (July 4, 2023) https://github.com/Cebtenzzre/tumblr-utils/commit/e5d80ffa9d6c60d44b1e119599a64d648f7ccd34
Will have to give more specification of the missing image backup bug later, as it was not on my machine (and the person is not tech savvy).
And would assume would have to put that in the separate issue report since it's the Mac OS specific error version ?
--
Can give details on the blank index page bug, though, as was able to reproduce it on my Windows machine as well after it came to my attention that it was happening.
It happens on Mac OS (1 users), and Windows (2 users). All 3 had blank index page bug.
In my case:
https://github.com/Cebtenzzre/tumblr-utils/commit/e5d80ffa9d6c60d44b1e119599a64d648f7ccd34 : Downloaded the new tumblr-util at the time, unzipped it, put it on my default hard drive, added my API key into tumblr_backup.py, and commented out "import fcntl".
https://github.com/Cebtenzzre/tumblr-utils/commit/6ba4c144c874a5a2096d4767ec8bddd91ddc78ae: Unzipped the tumblr-util version that usually use, but unzipped it onto my default hard drive instead of my external hard drive where it works, and added my API key.
The one that works and has no issues also uses the same version as above on the external hard drive, same API key, same windows machine.
Ran default command "py tumblr_backup.py (blog name here)" for all of them first.
My default hard drive had not run tumblr-utils before this, whereas the external hard drive had done so before and still runs well (very confusing).
Edit: linked the same commit for all at first.
This thread is somewhat confusing with several different issues across several different operating systems. I have a feeling the missing images bug and the SSL CERTIFICATE_VERIFY_FAILED issue are related, now that I know it's on macOS it makes a little more sense. I will track the index.html issue here.
Btw, the macOS user shouldn't be commenting out the import fcntl
line, it's actually necessary on that platform. It's just not available on Windows.
Of course, changing to external hard drive (where there are no bugs) to check Python information, shows the same thing.
No custom python install shenanigans there far as am aware ?
Genuinely quite confused.
Not sure what difference is going on to be able to recreate what has been happening for others.
This thread is somewhat confusing with several different issues across several different operating systems. I have a feeling the missing images bug and the SSL CERTIFICATE_VERIFY_FAILED issue are related, now that I know it's on macOS it makes a little more sense. I will track the index.html issue here.
Btw, the macOS user shouldn't be commenting out the
import fcntl
line, it's actually necessary on that platform. It's just not available on Windows.
Noted.
Edit: (though the issue was that on Mac OS there was a missing module and it fell back to using imghdr just like on Windows. but that's a topic for when I open that issue separately later)
That combination of python, pip, and urllib3 versions immediately threw an exception when I tested it. Do you also have requests installed? I'd like to know the output of:
python -m pip show requests | findstr /c:Version
That combination of python, pip, and urllib3 versions immediately threw an exception when I tested it. Do you also have requests installed? I'd like to know the output of:
python -m pip show requests | findstr /c:Version
output:
also, in case
Oh, if you're using py instead of python to run tumblr_backup.py, then you'll have to rerun those commands with py instead:
py --version
py -m pip --version
py -m pip show urllib3 | findstr /c:Version
py -m pip show requests | findstr /c:Version
Sorry about that, I develop on Linux and I'm not really familiar with the 'py' launcher.
result
I haven't been able to reproduce the issue on Windows 10 with Python 3.11.4, pip 23.1.2, and no urllib3 or requests installed. I used -O
to save to three different hard drives - one SSD and two spinning HDDs.
I'd like you to add two debugging lines to tumblr_backup.py, like this:
diff --git a/tumblr_backup.py b/tumblr_backup.py
index d9fb4ea..58e19e8 100755
--- a/tumblr_backup.py
+++ b/tumblr_backup.py
@@ -595,6 +595,7 @@ class Index:
idx.write('<p><a href={}>Tag index</a></p>\n'.format(
urlpathjoin(tag_index_dir, dir_index)
))
+ print(f'writing {len(self.index)} years to index')
for year in sorted(self.index.keys(), reverse=options.reverse_index):
self.save_year(idx, archives, index_dir, year)
idx.write('<footer><p>Generated on %s by <a href=https://github.com/'
@@ -680,6 +681,7 @@ class Indices:
def build_index(self):
filter_ = join('*', dir_index) if options.dirs else '*' + post_ext
for post in (LocalPost(f) for f in glob(path_to(post_dir, filter_))):
+ print(f'adding post {post.ident} to index')
self.main_index.add_post(post)
if options.tag_index:
for tag, name in post.tags:
And then run tumblr_backup.py on the hard drive that causes the problem. I'd like to know the exact command you are running, and the full console output from the failed run, as well as a copy of the broken index.html that it generates. An example command you could use is: py tumblr_backup.py -n 10 just-art
(first with default py 3.11, second with python 3.10)
py tumblr_backup.py -n 10 just-art
py -3.10 tumblr_backup.py -n 10 mustlovegarlic
all content of just-art index.html (using python 3.11):
<!DOCTYPE html>
<meta charset=utf-8>
<title>Art, just art</title>
<link rel=stylesheet href=backup.css>
<body class=index>
<header>
<h1>Art, just art</h1>
<p class=subtitle>A blog for share inspiration and promote artists.</p>
</header>
<footer><p>Generated on 7/14/2023 8:23:51 PM by <a href=https://github.com/bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>
all content of mustlovegarlic index.html (using python 3.10):
<!DOCTYPE html>
<meta charset=utf-8>
<title>Must Love Garlic| Good Food & RV Living</title>
<link rel=stylesheet href=backup.css>
<body class=index>
<header>
<h1>Must Love Garlic| Good Food & RV Living</h1>
<p class=subtitle><p>Welcome to Must Love Garlic: a food and travel blog. Here you'll find easy, step-by-step comforting recipes and sprinkles of travel inspiration: must-love-garlic.com</p></p>
</header>
<footer><p>Generated on 7/14/2023 8:13:41 PM by <a href=https://github.com/bbolli/tumblr-utils>tumblr-utils</a>.</p></footer>
edit: formatting issues
Sorry for the delay. I'd like to get to the bottom of this.
Assuming you've kept those two added lines, could you try applying this change as well?
diff --git a/tumblr_backup.py b/tumblr_backup.py
index fa4af47..1609a17 100755
--- a/tumblr_backup.py
+++ b/tumblr_backup.py
@@ -680,7 +680,17 @@ class Indices:
def build_index(self):
filter_ = join('*', dir_index) if options.dirs else '*' + post_ext
- for post in (LocalPost(f) for f in glob(path_to(post_dir, filter_))):
+ glob_path = path_to(post_dir, filter_)
+ print(f'filter_={filter_!r} glob_path={glob_path!r}')
+ glob_res = glob(glob_path)
+ try:
+ with os.scandir(os.path.dirname(glob_path)) as it:
+ for e in it:
+ print(f'dir listing: have {e.name!r}')
+ except OSError as e:
+ print(f'scandir failed with {e!r}')
+ print(f'glob_res={glob_res}')
+ for post in (LocalPost(f) for f in glob_res):
print(f'adding post {post.ident} to index')
self.main_index.add_post(post)
if options.tag_index:
The output looks like this on my functioning install:
$ ./tumblr_backup.py -n 10 just-art -O /tmp/just-art
just-art: Stopping backup: Reached limit of 10 posts
filter_='*.html' glob_path='/tmp/just-art/posts/*.html'
dir listing: have '721966095332933632.html'
dir listing: have '185562966917.html'
dir listing: have '180787801522.html'
dir listing: have '185367500862.html'
dir listing: have '719436193408892928.html'
dir listing: have '719974155632738304.html'
dir listing: have '185359556132.html'
dir listing: have '185342871967.html'
dir listing: have '182755962637.html'
dir listing: have '185433472537.html'
glob_res=['/tmp/just-art/posts/721966095332933632.html', '/tmp/just-art/posts/185562966917.html', '/tmp/just-art/posts/180787801522.html', '/tmp/just-art/posts/185367500862.html', '/tmp/just-art/posts/719436193408892928.html', '/tmp/just-art/posts/719974155632738304.html', '/tmp/just-art/posts/185359556132.html', '/tmp/just-art/posts/185342871967.html', '/tmp/just-art/posts/182755962637.html', '/tmp/just-art/posts/185433472537.html']
adding post 721966095332933632 to index
adding post 185562966917 to index
adding post 180787801522 to index
adding post 185367500862 to index
adding post 719436193408892928 to index
adding post 719974155632738304 to index
adding post 185359556132 to index
adding post 185342871967 to index
adding post 182755962637 to index
adding post 185433472537 to index
writing 3 years to index
just-art: 10 posts backed up
py -3.10 tumblr_backup.py -n 15 mustlovegarlic
mustlovegarlic: Stopping backup: Reached limit of 15 posts
filter_='*.html' glob_path='C:\\Users\\crist\\Downloads\\[04] Programs\\[01] Apps\\[01] To Do Archiving\\tumblr-utils-7-2023\\mustlovegarlic\\posts\\*.html'
dir listing: have '701086876415787008.html'
dir listing: have '701539364955684864.html'
dir listing: have '703121534940807168.html'
dir listing: have '703121751113138176.html'
dir listing: have '704079125106851840.html'
dir listing: have '706172122872102912.html'
dir listing: have '706615090021105664.html'
dir listing: have '707436522077650944.html'
dir listing: have '707885362415058944.html'
dir listing: have '708476421443567616.html'
dir listing: have '708520969779675136.html'
dir listing: have '708695782566641664.html'
dir listing: have '709449757472522240.html'
dir listing: have '709511138464989184.html'
dir listing: have '709782865557258240.html'
glob_res=[]
writing 0 years to index
mustlovegarlic: 15 posts backed up
edit: added copy pasted output text, as well
Oh... it's the square brackets! I can fix that for you. glob() is a bad way to do this kind of thing, I'll replace that with scandir so it's more robust.
Can confirm, it fixed the Index bug, no other additional cause.
Thank you for the persistence in trying to help.
(note: The Mac OS bug ticket opening highly depends on the availability of the Mac OS user ... otherwise might try to replicate from the information and come back - but that would take a long while)
Have recently been trying out fresh installs of Tumblr-Utils (the latest version: June 2023.) with just the API key added, while trying to check for any issues to help others out that wanted to backup their Tumblr.
In the latest version, can confirm that "no module named 'fcntl'" error appears.
(I use Windows)
With Python 3.10, the error doesn't appear when just commenting out "import fcntl" in util.py. So at least there's that.
While fcntl was intended as replacement for deprecated imghdr, as your comment here for the May 16 2023 commit says: https://github.com/Cebtenzzre/tumblr-utils/commit/f89c663d4214d128349102b7b6353f935ae908b6
For Python 3.11, since imghdr is deprecated as a fallback, and fcntl not appear, it would cause some issues.
It ends up skipping image downloads as-is while backing up, and just links to tumblr hosted images, similar to when using the -k command... except not actually using the command. (This ends up happening to someone I was trying to help, who uses Mac OS and first tried with Python 3.11)
The error in question (not mine): without commenting out "import fcntl"
with commenting out "import fcntl"
Still, even with commenting out "import fcntl" so that bug doesn't appear, the index page will be blank besides the tumblr title and description.
... Though this also happens on fresh installs (with just API key added) of an older tumblr-utils I had downloaded before. So it doesn't seem related as initially thought.
The same thing of blank index page happens for those am trying to help with their fresh installs, including on MacOS. At least 2 other people.
Meanwhile, my old install (not fresh install) on a different hard drive, using January 2022 tumblr-utils, does not have either issue and can make a proper index page. (the same blog, on same day)
And have been trying figure out why the blank index page thing happens, but have no idea. When the script runs, these fresh installs (of January 2022, and latest of June 2023) don't flash the message "building index page" before finishing, unlike the old one I use.
Am reporting both because when tried to "fix" one issue, the other popped up right after.