Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
45 stars 33 forks source link

Omni5 issue #17

Closed jjzieve closed 4 years ago

jjzieve commented 5 years ago
1teaspoon commented 5 years ago

Hi,

I tried this BeadArrayFiles library, and for some reason xRaw has negative values. Below is the error and the code I was running:

Raw X intensity must be positive integer. Failed value: 2820

manifest = BeadPoolManifest(manifest_file)
gtc = GenotypeCalls(gtc_file)
GenoScores = gtc.get_genotype_scores()
genotypes = gtc.get_genotypes()
xRaw = gtc.get_raw_x_intensities()
yRaw = gtc.get_raw_y_intensities()
norm = gtc.get_normalized_intensities(manifest.normalization_lookups)
with open(outFile, 'wb') as output:
    for i in range(8):
        output.write(struct.pack('<H', 0))
    for (x, y, n, genoScore, geno) in zip(xRaw, yRaw, norm, GenoScores, genotypes):
        (normX, normY) = n
        if geno not in [0,1,2,3]:
            eprint('unexpected geno: ' + str(geno))
            sys.exit(1)
        elif not isinstance(x, int) or x < 0:
            eprint('Raw X intensity must be positive integer.  Failed value: ' + str(x))
            sys.exit(1)
        elif not isinstance(y, int) or y < 0:
            eprint('Raw Y intensity must be positive integer.  Failed value: ' + str(y))
            sys.exit(1)
        elif not isinstance(normX, numbers.Number) or math.isnan(normX) or normX < 0:
            eprint('Norm X intensity must be positive number.  Failed value: ' + str(normX))
            sys.exit(1)
        elif not isinstance(normY, numbers.Number) or math.isnan(normY) or normY < 0:
            eprint('Norm Y intensity must be positive number.  Failed value: ' + str(normY))
            sys.exit(1)
        elif not isinstance(genoScore, numbers.Number):
            eprint('Genotype Score must be number.  Failed value: ' + str(genoScore))
            sys.exit(1)
        if geno == 0:
            newGeno = 3
        else:
            newGeno = geno - 1
        output.write(struct.pack('<HHfffH', x, y, normX, normY, genoScore, newGeno))
KelleyRyanM commented 5 years ago

Hi @1teaspoon , It looks like you are checking the type of the raw intensity against "int". I believe you should be checking the type against "uint16" (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html). Can you check if changing that type comparison addresses the observed error?

1teaspoon commented 5 years ago

Hi @KelleyRyanM,

Thanks for your reply! I tried using unit16 instead of int, and now the error becomes: Norm X intensity must be positive number. Failed value: nan

manifest = BeadPoolManifest(manifest_file)
gtc = GenotypeCalls(gtc_file)
GenoScores = gtc.get_genotype_scores()
genotypes = gtc.get_genotypes()
xRaw = gtc.get_raw_x_intensities()
yRaw = gtc.get_raw_y_intensities()
norm = gtc.get_normalized_intensities(manifest.normalization_lookups)
with open(outFile, 'wb') as output:
    for i in range(8):
        output.write(struct.pack('<H', 0))
    for (x, y, n, genoScore, geno) in zip(xRaw, yRaw, norm, GenoScores, genotypes):
        (normX, normY) = n
        if geno not in [0,1,2,3]:
            eprint('unexpected geno: ' + str(geno))
            sys.exit(1)
        elif not isinstance(x, np.uint16) or x < 0:
            eprint('Raw X intensity must be positive integer.  Failed value: ' + str(x))
            sys.exit(1)
        elif not isinstance(y, np.uint16) or y < 0:
            eprint('Raw Y intensity must be positive integer.  Failed value: ' + str(y))
            sys.exit(1)
        elif not isinstance(normX, numbers.Number) or math.isnan(normX) or normX < 0:
            eprint('Norm X intensity must be positive number.  Failed value: ' + str(normX))
            sys.exit(1)
        elif not isinstance(normY, numbers.Number) or math.isnan(normY) or normY < 0:
            eprint('Norm Y intensity must be positive number.  Failed value: ' + str(normY))
            sys.exit(1)
        elif not isinstance(genoScore, numbers.Number):
            eprint('Genotype Score must be number.  Failed value: ' + str(genoScore))
            sys.exit(1)
        if geno == 0:
            newGeno = 3
        else:
            newGeno = geno - 1
        output.write(struct.pack('<HHfffH', x, y, normX, normY, genoScore, newGeno))
jjzieve commented 4 years ago

@1teaspoon The normalized intensities can be nan if both the x and y intensities aren't greater than zero (i.e. https://github.com/Illumina/BeadArrayFiles/blob/881bbac491614e91cc7be6c8e9b8fc16cade4937/module/GenotypeCalls.py#L678). This indicates missing intensities, so "normalizing" on them doesn't make sense. That behavior is reproducible in develop as well, so its not related to this branch.