Closed jordiyeh closed 2 years ago
Alright, so this is actually 2 errors. One was a typo (the one that happened during handling) and the other was wkhtmltopdf saying that it needs an input file for some reason. I'll fix the typo one first then see if I can't track down why that wk issue happened.
(Edit: You are also a version behind, although that's not the cause of either issue)
To clarify, you have confirmed that extract_msg --pdf test.msg
is enough to get it to give that error? Does it happen on specific files or all of them? What operating system are you using and does the issue happen on a different operating system (if you can test that)?
So far I have not been able to reproduce the wk error myself, despite using the exact same code you have mentioned, and using the same version of wkhtmltopdf listed in the traceback.
Edit: Here is a list of all of the things I have tried in order to get it to fail, using version 0.36.3 of extract-msg (wkPath was sometimes omitted which used the version on the path which was older, but had the same result of no error):
with extract_msg.openMsg('test.msg') as msg:
... msg.save(pdf = True, wkOptions = ['-O', 'Portrait'], wkPath = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
with extract_msg.openMsg('test.msg') as msg:
... msg.save(pdf = True, wkPath = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
extract_msg --wk-path "C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --pdf test.msg
extract_msg --wk-path "C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --pdf --wk-options "+O Portrait" test.msg
It happens on a couple of different files.msg. I am using Mac OS. The problem in my case seems to be in the following lines
process = subprocess.Popen([wkPath, *parsedWkOptions, '-', '-'], shell = True, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
# finish.
output = process.communicate(self.getSaveHtmlBody(**kwargs))
wkPath is resolving correctly to '/usr/local/bin/wkhtmltopdf' and parsedWkOptions = ['+O', 'Portrait']
_input has the HTML of the msg
<bound method Popen._check_timeout of <Popen: returncode: 1 args: ['/usr/local/bin/wkhtmltopdf', '+O', 'Portrait',...>>
I am checking possible options for what the return code = 1
parsedWkOptions = ['+O', 'Portrait']
That would cause a problem, that should be -O
not +O
. The command line for extract_msg substitutes + with - for wkoptions because of issues with argparse (or should be, but something may be going wrong with your copy).
But basically the error is saying something is wrong with the listing for the input and output, and I'm guessing the +O
might be the reason why
I used -O, but changed it to +0.
The following change fixed the issue for me.
Instead of in message_base.py line 397
process = subprocess.Popen([wkPath, *parsedWkOptions, '-', '-'], shell = True, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
I used
process = subprocess.Popen(' '.join([wkPath, *parsedWkOptions, '-', '-']), shell = True, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
Does it work for you?
I'll have to test it, but I believe this will cause errors if the path has spaces in it because of how Popen works.
If you try with no options, does the original code work?
If I remove the options, I get the same issue.
What about adding quotes on wkPath?
wkPath = '"' + findWk(kwargs.get('wkPath')) + '"'
The only thing I found on differences between string and list in popen is:
"Note If the cmd argument to popen2 functions is a string, the command is executed through /bin/sh. If it is a list, the command is directly executed." Ref: https://docs.python.org/3/library/subprocess.html#replacing-os-popen-os-popen2-os-popen3
Try this for the line and see how it goes. It tested working on a windows system, but still need to test it in a linux environment to ensure I won't break that (given it's the environment most commonly used for extract-msg). I'm also going to have to adjust things to either disallow bytes in the options or decode them. The list format allows a mix of bytes and strings while join does not.
process = subprocess.Popen(' '.join(f'"{x}"' if ' ' in x and x[0] != '"' else x for x in [wkPath, *parsedWkOptions, '-', '-']), shell = True, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
This handles that the path may sometimes need to be quoted, and that other options may also sometimes need to be quoted
Thanks for the clarification on why join will not decode mix bytes and strings.
The code handling the quoted path works for me now in a Mac OS environment.
Since it works on mac, I'll start doing full testing and see if I can find how the command deals with the mixed bytes and strings. Better would be if I could find why I can't manage to replicate the issue.
The issues seems to have something to do with your environment specifically, as with 0 arguments I have been completely unable to replicate the issue on windows or linux. Can you tell me if this gives you loading followed by a bunch of gibberish (that means it did not error) or whether it will give the wkhtmltopdf help (looks like what was in the beginning of your traceback, with all the options stuff):
wkhtmltopdf - - < /dev/null
The command should try to simulate what the command is doing when you give it no options.
Also just to be 100% clear, it gives the exact same traceback with the original line if you call the function like this?
msg.save(pdf = True)
given the information you have given me, it's possible your environment may have issues with a lot of modules, since list arguments to Popen are standard, so you should probably test to make sure they work at all. Best way I can think of to do this is to make a python file, then try to use Popen to run it as a subprocess and check the output. So write a script that is like this:
print('Hello world!')
And assuming that is in the current working directory that your interpreter is running in, the output of the following code should either be "Hello world!" or, as I am guessing it may end up doing, the start of the python interpreter's output when given no arguments.
import subprocess
import sys
from subprocess import PIPE
# Assuming your small test script is "my_file.py"
a = subprocess.Popen([sys.executable, 'my_file.py'], stdout = PIPE, stdin = PIPE, stderr = PIPE)
print(''.join(x.decode('utf-8') if isinstance(x, bytes) else x for x in a.communicate('')))
If your output looks something like this, that means that list arguments are completely failing on your system, something that should not be happening. Frankly the fact that I can't replicate this, nor find anyone else having this kind of problem, suggests that it is not a bug in my code but rather your environment 🤷♀️
Edit: Also looking at what you were mentioning for the difference between string and list, looks like that is for the function os.popen2
and not for subprocess.Popen. Looking at the code for Popen, it looks like it does not change the behavior except that it converts the list to a single string internally. And after checking the docs, it is the shell
argument, which is set to true, which handles is /bin/sh
is used. Given that the code that didn't work and the code that did work both use it, I suspect that is not what is causing the problem. You can test to see if setting shell = false
on the original code allows that to work or not.
I've posted some new code (had to adjust the subprocess code because it apparently had a security vulnerability) to next-release. If you could use this to install from that branch and see if that code works for you or fails, that would be great. If it still fails, I'll just swap to the string parsing once I finish that (confirmed that it works) and then call it a day at this point.
pip install "git+https://github.com/TeamMsgExtractor/msg-extractor@next-release"
Thanks. 0.36.4 solved the issue for me!
Bug Metadata
Describe the bug I am following https://github.com/TeamMsgExtractor/msg-extractor/issues/102 and it results in
msg.save(pdf = True, wkOptions = ['-O', 'Portrait'])
'Message' object has no attribute 'listdir'
[ If applicable ] **What code did you use or can we use to reproduce this error?
Is there a message.msg file you want to share to help us reproduce this?
Traceback