biocpp / biocpp-io

BioC++ Input/Output library
https://biocpp.github.io
BSD 3-Clause "New" or "Revised" License
8 stars 5 forks source link

VCF output runs into assertion if ref().size() > 1 #53

Closed joergi-w closed 2 years ago

joergi-w commented 2 years ago

The problem is that record.ref() only accepts a single nucleotide, although its type is dna5_vector. If I set multiple nucleotides, the program dies with a segmentation fault (release build) or Assertion failed: (buffer_space > 0) in function write_range (debug build).

This is my minimal code example:

#include <seqan3/alphabet/nucleotide/dna5.hpp>
#include <bio/var_io/writer.hpp>

int main()
{
    using namespace seqan3::literals;
    bio::var_io::header hdr{};
    hdr.file_format = "VCFv4.3";
    hdr.column_labels = {"CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", "x"};

    bio::var_io::writer writer{std::cout, bio::vcf{}};
    writer.set_header(hdr);
    bio::var_io::default_record<> record{};
    record.chrom() = "chr1";
    record.pos() = 11111;
    record.id() = "test";
    record.ref() = "ATC"_dna5; // <- problem here: "A"_dna5 works
    record.alt() = {"AGC", "A"};
    record.qual() = 1.F;
    record.filter() = {"PASS"};
    record.genotypes() = {};
    record.info() = {};
    writer.push_back(record);

    return 0;
}

My current work-around is changing the type of ref() to std::vector<char>, which works as expected.

h-2 commented 2 years ago

Yeah, this took me quite a while to figure out. The problem is that by default, std::cout does not do any buffering. This is horrible for performance and something you never want unless you are mixing calls to std::cout and printf() (which you also shouldn't be doing!). The correct workaround is to call this somewhere at the beginning of your program

std::ios::sync_with_stdio(false);

This will make std::cout do regular buffering, which improves performance and will silence the assertion.

I will think about adding a buffering step when I see this situation.

h-2 commented 2 years ago

I have now also added better handling of the situation to the library, so you will no longer get the assertion.