bcgsc / biobloom

Create Bloom filters for a given reference and then use it to categorize sequences
http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools
GNU General Public License v3.0
75 stars 15 forks source link

intermittent I/O errors when writing large files (patch included) #9

Closed benvvalk closed 8 years ago

benvvalk commented 8 years ago

I have observed frequent I/O errors when writing large files with BBT. Upon reading the saved file, I will see errors like:

bin.seed_mp.bf does not match size given by its information file. Size: 25862192416 vs 12977290528 bytes.

It looks like it some kind of bug with the C++ library implementation on our HPC cluster (CentOS 5 cluster using GPFS).

I have observed that changing the C++ I/O calls to equivalent C I/O calls solved the problem. Unfortunately github does not allow you to attach files with the .patch or .diff extensions. Instead, here is the patch pasted inline:

diff --git a/Common/BloomFilter.cpp b/Common/BloomFilter.cpp
index e459290..6e4fd4d 100644
--- a/Common/BloomFilter.cpp
+++ b/Common/BloomFilter.cpp
@@ -148,16 +148,16 @@ bool BloomFilter::contains(const unsigned char* kmer) const
  */
 void BloomFilter::storeFilter(string const &filterFilePath) const
 {
-       ofstream myFile(filterFilePath.c_str(), ios::out | ios::binary);
+       FILE *out = fopen(filterFilePath.c_str(), "wb");
+       assert(out != NULL);

        cerr << "Storing filter. Filter is " << m_sizeInBytes << "bytes." << endl;

-       assert(myFile);
-       //write out each block
-       myFile.write(reinterpret_cast<char*>(m_filter), m_sizeInBytes);
+       fwrite((const void*)m_filter, sizeof(char), m_sizeInBytes, out);
+       if (ferror(out))
+               perror("Error writing file");

-       myFile.close();
-       assert(myFile);
+       fclose(out);
 }

 unsigned BloomFilter::getHashNum() const

You may copy the above text to a file called io.patch and then apply it in the root BBT directory with:

$ patch -p1 < io.patch

Or I can just push the commit to develop if you like. (I have been using this change for quite a while without any problems.)

JustinChu commented 8 years ago

I think it is safe to push the fix to develop.

benvvalk commented 8 years ago

Okay, I pushed it.