Closed n4mwd closed 4 years ago
URLs are sometimes sanitized by converting non-alphanumeric chars into escaped chars. So
"Joe"s files" becomes "Joe%22s files" and
"what about this?" becomes "what about this%3F" and
"five %" becomes "five %25".
This could work here and would also make yahoo names into legal windows file names.
This would work until you hit "This .. Is ... a ... really ... long ... filename" which could exceed MAXFILESIZE. In that case, I would just use the first 20 chars plus a 4 digit hex count. So "Little tommy...[500 chars] ... photo 1" becomes "Little tommy...[8 chars] 0000" and "Little tommy...[500 chars] ... photo 2" becomes "Little tommy...[8 chars] 0001".
Might be a url enocoder in python that could be repurposed. Just guessing.
I'm using windows so had to implement a fix for the photos for one of the groups I was downloading.
the following sanitizing worked for the 3 groups I tried it on.
fname = re.sub(r'[\/*?:"<>|-]',"",fname)
you need to implement it in the mkchdir section as well. Hope this helps
I am having this same problem with several groups. I have a small group that I keep testing to see if there has been a fix, but none so far.
I'm new to this whole github thing.. so I don't know how to help fix the problem in the original, but if you look at my branch you'll see I added the fix for fname and mkchdir. I tested it on the group I was having problems with and it works fine now by deleting the problem characters.
@flyintheointment thanks for the offer, I'm going for a more robust solution though.
2019-10-24 22:16:47 Eastern Daylight Time 842 INFO:archive_photos Fetching photo 'Mae"s Bulkhead!!!' (141/320) Traceback (most recent call last): File "C:\Python27\Scripts\yahoo.py", line 667, in
archive_photos(yga)
File "C:\Python27\Scripts\yahoo.py", line 293, in archive_photos
with open(fname, 'wb') as f:
IOError: [Errno 22] invalid mode ('wb') or filename: u'2071862851-Mae"s Bulkhead!!!.jpg'
The double quote inside the filename seems to be throwing it off. I had another with a question mark. The above error occurred after about an hour of processing due to the large file sizes.
Also, as an enhancement, it would be nice if the script could detect if the photo was already there and skip it if it is.