Closed joachimwolff closed 3 years ago
Probably pyBigWig needs to be able to seek in the stream to update the header or so. A gzip stream is not seekable during write. It might work with a plain tar file as that one should be seekable.
You could also try a zip file with compression=ZIP_STORED
: https://docs.python.org/3/library/zipfile.html . This should store the bigWig files uncompressed (they have compression of their own, so it does not make sense to try to compress them again). The advantage of a zip file is, that it has a file/dir index, so it should be faster to list files from it than with tar (as you have to read through the whole tar file to find all file names).
Thanks for your reply.
Probably pyBigWig needs to be able to seek in the stream to update the header or so. A gzip stream is not seekable during write. It might work with a plain tar file as that one should be seekable.
I am writing to BytesIO object, the compression or the tar itself matter only in the line tar.addfile(tarinfo=tar_info, fileobj=file)
. However, the error happens earlier. The problem seems to me that pyBigWig expects a real file, and that is exactly what I try to avoid. I don't want to have thousands of bigwig files written to disk, then merge them together to a tar, and then delete the bigwigs on the disk again.
The file name passed to open()
is passed directly to C in:
fopen(fname, mode);
So it has to be an actual string name of a file, rather than relying on some pythonic abstraction. Perhaps you can you mmap to come up with a solution to this (maybe there's a memory-mapped path mechanism), but I doubt it'd be a simple process.
I see, thanks @dpryan79.
To document my solution: I create a temporary directory, write to it the files, and put the directory to a tar. Afterward, I delete the temporary directory.
Hi Devon,
I have to write many bigwig files to disk, and I thought it is better to write only one tar.gz file instead of many single bigwigs. My idea to implement this is to write the bigwig data to a BytesIO object and save this to the tar.gz. With text files, I implemented a similar solution successfully.
I get now the error that pyBigWig can't open the BytesIO object to write to:
Is it in general not possible to use BytesIO for this? Or am I doing anything obviously wrong? Any help is appreciated.
Best,
Joachim