LocusAggregate.aggregate_samples only populates first LocusAggregate

daveware-nv commented 1 year ago

This line assigns a generator: https://github.com/Illumina/BeadArrayFiles/blob/dc4eb370fa97582db3857680cbf3071cde9a6ec5/module/LocusAggregate.py#L196-L197

Which is then used here: https://github.com/Illumina/BeadArrayFiles/blob/dc4eb370fa97582db3857680cbf3071cde9a6ec5/module/LocusAggregate.py#L200-L201

Here one GenerateLocusAggregate is created for each index in the loci group, however when the first GenerateLocusAggregate is called it consumes buffer, meaning the subsequent GenerateLocusAggregate calls are empty.

Probably the simplest way to fix this is for the generator results to copied into a list to it can be used multiple times:

buffer = list(LocusAggregate.load_buffer(
                samples, loci_group[0], loci_group[-1] - loci_group[0] + 1, normalization_lookups))

jjzieve commented 1 year ago

@daveware-nv Hmm yeah. Your fix suggestion sounds good. Do you want to do a PR? I can review. Maybe inlining it may work as well? i.e.

aggregates = map(GenerateLocusAggregate(
   LocusAggregate.load_buffer(
    samples, loci_group[0], loci_group[-1] - loci_group[0] + 1, normalization_lookups)
    , loci_group[0]), loci_group)

I guess that would be less mem-intensive, but slower runtime since we aren't memoizing the buffer. I don't have a preference, bin_size can be adjusted if mem is a problem.

jjzieve commented 1 year ago

merged https://github.com/Illumina/BeadArrayFiles/pull/32

Illumina / BeadArrayFiles

LocusAggregate.aggregate_samples only populates first LocusAggregate #31