blachlylab / dhtslib

D bindings and OOP wrappers for htslib
MIT License
7 stars 1 forks source link

Reference counting implementation is incorrect #107

Closed charlesgregory closed 3 years ago

charlesgregory commented 3 years ago

So I have finally figured out my segfault woes. You cannot shove reference counted structs into an array. I was writing something that grouped SAMRecords together (using chunkBy) and then I converted that group range into an array.

When in an array the SAMRecords had no other references causing their dtors to be called. My solution to avoid the dtor basically a kind of move. Solution for my progam:

foreach (group; bam.allRecords.chunkBy!((a, b) => a.queryName == b.queryName))
{
        // convert SAMRecord Range to array of bam1_t *
        // this is effectively kind of move
    auto recs = group.map!((x){
        auto ptr = x.b;
               // set rec ptr to null to avoid data destruction
        x.b = null;
        return ptr;
    }).array;

        // first loop, no destroy
    foreach (i, b; recs)
    {
        auto rec = SAMRecord(b);

        // do stuff then set rec ptr to null to save data
        rec.b = null;   
    }

        // last loop
    foreach (i, b; recs)
    {
        auto rec = SAMRecord(b);

        // do stuff
        // last loop so let data die
    }

}

Potentially we can mirror what htslib does and set a flag when we don't want the struct to own and destroy the data. This is kinda an edge case but it highlights a weakness to reference counting.

jblachly commented 3 years ago

Not sure I understand -- if you mean a dynamic array, this should be scanned by the GC. why is this a problem (or directly, why would this have no other references ?)

jblachly commented 3 years ago

Can you share a link to the code that segfaults?

charlesgregory commented 3 years ago

I have pushed a test program into the develop branch that uses the htslib test files that demonstrates the issue.

dub build -c chunkby_test
./chunkby
thewilsonator commented 3 years ago

chunkBy gives a range that can be .saved (a ForwardRange if you're looking at the documentation, c.f. an InputRange where this operation is not available), if you need to do multiple passes over a given group, do

foreach (group; bam.allRecords.chunkBy!((a, b) => a.queryName == b.queryName))
{ 
    foreach (g;group.save) {  /* iterate and consume a _copy_ of the _range_ group*/}
    foreach (g;group.save) {  /* iterate and consume a _copy_ of the _range_ group*/}
    // ... however many times you need to
    foreach (g;group) {  /* iterate and consume the _range_ group*/}
    // foreach (g;group) { } // executes no times because `group` has already been iterated
}

assuming bam.allRecords is a SAMRecord[], calling .save is like copying the pointer at the beginning of the outer foreach loop and then iterating using that copy

If you encounter these types of problems, consider posting to the learn forum, chances are you'll get help much faster.

charlesgregory commented 3 years ago

fixed by #108