PabloCastellano / pablog-scripts

Misc personal scripts
55 stars 20 forks source link

MBOX file size limit? #2

Open ThinkMaize opened 8 years ago

ThinkMaize commented 8 years ago

I am trying to use this scrip to extract all of attachments from an MBOX, file, and I keep receiving an error. I can't seem to figure out what is triggering the error, so I was just wondering if it might be that my MBOX file is too large. It's about 17gb. Here's my command prompt output (your script is mbox.py):

C:\Users\Alex\Desktop\Email Attachments>python mbox.py mail.mbox Extract attachments from mbox files Copyright (C) 2012 Pablo Castellano This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain cond itions.

Attachment found! Extracting down_arrow.png (2799 bytes)

Attachment found! Extracting profilephoto.png (1286 bytes)

Attachment found! Extracting windows.png (1692 bytes)

Attachment found! Extracting keyhole.png (4422 bytes)

Attachment found! Extracting google_logo.png (12199 bytes)

Attachment found! Extracting 2015_04_23_10:48.csv (3660265 bytes)

Attachment found! Extracting 2015_04_23_11:18.csv (3660737 bytes)

Attachment found! Extracting 2015_04_23_11:48.csv (3661510 bytes)

Attachment found! Traceback (most recent call last): File "mbox.py", line 164, in extract_attachment(payl) File "mbox.py", line 77, in extract_attachment content = base64.decodestring(content) File "C:\Python27\lib\base64.py", line 321, in decodestring return binascii.a2b_base64(s) binascii.Error: Incorrect padding

C:\Users\Alex\Desktop\Email Attachments>

PabloCastellano commented 8 years ago

Hey!

It doesn't look to me that the issue is that the file is too large. Have you tried these tricks? https://stackoverflow.com/questions/4923509/python-decode-strings

Or maybe could you share the string? (pastebin or similar)

ThinkMaize commented 8 years ago

I think it might be the ":" in the filename. Looks like it's allowed in gmail, but not on Windows.

hut8 commented 8 years ago

I don't think it's an issue with the filename at all, as ":" works just fine in the file above the one that caused the error, and ":" (while not technically allowed on Windows, I guess) actually seems to work based on a quick test I just did -- I can't create a file with ":" in explorer, but it works fine elsewhere.

Base64 padding is sequence of 0 or more =s at the end of a Base64 string. Those can actually be omitted because they don't hold any real value; they just make it "easier" to break the base64 encoded string into byte-sized chunks (bad pun :disappointed:). But a2b_base64() seems to require them, or it throws an error.

Putting this right above the call to decode might fix it:

content += '=' * (-len(content) % 4)