Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
46 stars 33 forks source link

LocusAggregate.aggregate_samples only populates first LocusAggregate #31

Closed daveware-nv closed 1 year ago

daveware-nv commented 1 year ago

This line assigns a generator: https://github.com/Illumina/BeadArrayFiles/blob/dc4eb370fa97582db3857680cbf3071cde9a6ec5/module/LocusAggregate.py#L196-L197

Which is then used here: https://github.com/Illumina/BeadArrayFiles/blob/dc4eb370fa97582db3857680cbf3071cde9a6ec5/module/LocusAggregate.py#L200-L201

Here one GenerateLocusAggregate is created for each index in the loci group, however when the first GenerateLocusAggregate is called it consumes buffer, meaning the subsequent GenerateLocusAggregate calls are empty.

Probably the simplest way to fix this is for the generator results to copied into a list to it can be used multiple times:

buffer = list(LocusAggregate.load_buffer(
                samples, loci_group[0], loci_group[-1] - loci_group[0] + 1, normalization_lookups))
jjzieve commented 1 year ago

@daveware-nv Hmm yeah. Your fix suggestion sounds good. Do you want to do a PR? I can review. Maybe inlining it may work as well? i.e.

aggregates = map(GenerateLocusAggregate(
   LocusAggregate.load_buffer(
    samples, loci_group[0], loci_group[-1] - loci_group[0] + 1, normalization_lookups)
    , loci_group[0]), loci_group)

I guess that would be less mem-intensive, but slower runtime since we aren't memoizing the buffer. I don't have a preference, bin_size can be adjusted if mem is a problem.

jjzieve commented 1 year ago

merged https://github.com/Illumina/BeadArrayFiles/pull/32