For streaming intersections of moderate-sized files (say, >5000 features), the following blocks::
z = a.intersect(b, stream=True).intersect(c, stream=True)
len(z)
The schematic below shows what's happening with stdin/stdout and pipes. The above command hangs when trying to write to the stdin of the second process, marked below as ^^^^^^.
Despite a forced flush of stdout of command (1) and stdin of command (2) in helpers.call_bedtools,as well as forcing flush of stdout in command (2) in the IntervalIterator, this still blocks.
In the Popen command, setting bufsize=1 or bufsize=0 doesn't help. Docs for Popen.communicate() say that it'll block for large input.
Various stackoverflow answers for similar problems describe the solution to this as using separate threads for each call, however, initial tests make interactive work in IPython a little crazy.
My guess is that workarounds like "rendering" a streaming BedTool to disk will be needed for the near-to-mid-future, since fixes to this will be difficult.
For streaming intersections of moderate-sized files (say, >5000 features), the following blocks::
The schematic below shows what's happening with stdin/stdout and pipes. The above command hangs when trying to write to the stdin of the second process, marked below as
^^^^^^
.Despite a forced flush of stdout of command (1) and stdin of command (2) in helpers.call_bedtools,as well as forcing flush of stdout in command (2) in the IntervalIterator, this still blocks.
In the Popen command, setting
bufsize=1
orbufsize=0
doesn't help. Docs forPopen.communicate()
say that it'll block for large input.Various stackoverflow answers for similar problems describe the solution to this as using separate threads for each call, however, initial tests make interactive work in IPython a little crazy.
My guess is that workarounds like "rendering" a streaming BedTool to disk will be needed for the near-to-mid-future, since fixes to this will be difficult.