brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

bwameth can't handle stdin input #48

Closed bwlang closed 6 years ago

bwlang commented 6 years ago

I'll work on this later today... sorry I missed it earlier!

bwlang commented 6 years ago

After testing, i discovered that bwameth cannot operate using - or /dev/stdin. I think this might be islice... @brentp any suggestions on how to approach this?

brentp commented 6 years ago

when you read the lines to sniff, the keep them, then add then you can itertools.chain them with the filehandle.

bwlang commented 6 years ago

hi... I think i misdiagnosed this error... but I tried your suggestion anyway.

I think maybe islice does not work with stdin first_five = list(islice(fq1, 5))

also tried first_five = list(islice(sys.stdin, 5)) - same hang.

bwlang commented 6 years ago

as a quick follow up... this tiny test script does work (python3 and python2). so that eliminates islice as the source of this problem - still digging.

import sys
from itertools import groupby, repeat, chain, islice

first_five = list(islice(sys.stdin, 5))
sys.stderr.write("first 5 lines = %s\n" % first_five)

python test_islice.py < test_interleaved.fq first 5 lines = ['@A00336:A00336:H5V7WDMXX:1:1220:10004:10770 1:N:0:CTCAGAAG\n', 'ATTTGTTTGTTAAAAAATTGATATTTTTTTAATGTATTTTTTTTTTTTATTATTGTTTGTTGATTAGATTAAGTATTTATATTTTTTTTTTTTAATAATTT\n', '+\n', ',FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:F:FFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFF,F:F,,FFF\n', '@A00336:A00336:H5V7WDMXX:1:1220:10004:10770 2:N:0:CTCAGAAG\n']

bwlang commented 6 years ago

aha! I think it's failing because the c2t process's stdin (created at https://github.com/brentp/bwa-meth/blob/10621a4c1a8be8fa96de80dd851c52bac70004a2/bwameth.py#L313 ) is not the calling process's stdin. running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG\tID:-\tSM:-' -t 6 ref.fa.bwameth.c2t '</usr/local/opt/python3/bin/python3.6 ../bwameth.py c2t - NA'

I'm thinking about whether nopen('|bwameth c2t - NA | bwa mem ... ' ) will work?

Maybe another possibility to get this working on a pipe... call c2t directly from bash and pipe it's output to bwameth with a new arg e.g. bwameth raw - which would just run bwa.

Let me know if you have a preferred path.