dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
71 stars 39 forks source link

Support for muscle v5 #572

Open isaacovercast opened 3 weeks ago

isaacovercast commented 3 weeks ago

We should eventually migrate to muscle v5. We are pinned at v3 for now (more details in #477), but it will become imminent for the need to switch at some point. The main difference for us is that v5 does not provide builtin support for piping to stdout. The suggested workaround is this:

# must set -output in muscle5
cmd = ("echo -e '{}' | {} -quiet -align /dev/stdin -output /dev/stdout ; echo {}"
             .format(lclust, ip.bins.muscle, "//\n"))

And i tried implementing this in step 3 (persistent_popen_align3), but the call fails with:

---Fatal error---
Cannot create /dev/stdout, errno=13 Permission denied

You can run this call by hand in a terminal and it works fine:

head -n 8 /tmp/ipyrad-test/se-tmpalign/1B_0_chunk_0.ali | muscle -quiet -align /dev/stdin -output /dev/stdout

But when you run inside a subprocess.Popen call you get this permission error. I think the problem is that in muscle v3, it is testing for -out - and then using the C native stdout handle (which is somehow magically different from /dev/stdout). muscle v5 doesn't use the C native stdout internally, and when you pass -output /dev/stdout it tries to fopen() this "file" and fails (when run inside the python Popen context).

I could not yet figure out a way to work around this that is not ugly.

isaacovercast commented 3 weeks ago

I can do THIS and it works fine:

import subprocess as sps
proc = sps.Popen(
    ["bash"],
    stdin=sps.PIPE,
    stdout=sps.PIPE,
    bufsize=0,
)
cmd = ("id; echo 'wat' > /dev/stdout; echo \n")
proc.stdin.write(cmd.encode())
for line in iter(proc.stdout.readline, b'\n'):
    print(line.decode())

But doing this gives the permission error:

## Prepare a chunk
chunk = "1A_0_chunk_0.ali"
clusts = []
with open(chunk, 'rb') as infile:
    clusts = infile.read().decode().split("//\n//\n")
    clusts = [i for i in clusts if i]

## Call muscle
cmd = ("echo -e '{}' | {} -quiet -align /dev/stdin -output /dev/stdout ; echo {}"
                        .format(clusts[0], ipyrad.bins.muscle, "//\n"))
proc.stdin.write(cmd.encode())
for line in iter(proc.stdout.readline, b'\n'):
    print(line.decode())