Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
46 stars 33 forks source link

Incorporate fix from ekarlins #6

Closed KelleyRyanM closed 7 years ago

KelleyRyanM commented 7 years ago

Error handling for scale equals zero in normalization

KelleyRyanM commented 7 years ago

I'll still need to run some additional regression, but let me know if you have any comments

ekarlins commented 7 years ago

@KelleyRyanM, conceptually the changes you suggest seem great! I still need to test your code on my use cases, though. One question, do the logr or baf functions use normalized intensities? Would they be effected by this change or do they use similar math?

Thanks! Eric

KelleyRyanM commented 7 years ago

@ekarlins, While the logR and BAF functions are calculated from normalized intensities, it is important to note that the calculation of those values all occur with AutoCall or AutoConvert, and the binary values are then written into the GTC file. In this library, we're merely reading and reporting those values. The changes here would not have any impact on how those values are reported from this library, except that the return type for those functions should now be numpy.float32 instead of the built-in python float type.

ekarlins commented 7 years ago

@KelleyRyanM, sorry it's taken a while to me to get back to this. I'm just checking now to see how these changes work on the gtc files that originally gave the ZeroDivisionError. Which branch should I use to check this? I checked in "develop" and still got the ZeroDivisionError. I then checked in "zero-division-normalization". The function "get_normalized_intensities" ran without error. Is this what you expect the output to look like:

>>> norm = gtc.get_normalized_intensities(manifest.normalization_lookups)
>>> sortNorm = sorted(norm)
>>> sortNorm[:10]
[(0, 0), (0, 0.00074798678), (0, 0.0048690871), (0, 0.0061703799), (0, 0.006283273), (0, 0.01445268), (0, 0.017426901), (0, 0.022675885), (0, 0.025251113), (0, 0.026536332)]
>>> sortNorm[-10:]
[(131.77289, 0), (135.35126, 0), (142.94295, 2.7291596), (149.1169, 9.2937365), (154.33495, 0.092526063), (inf, nan), (inf, nan), (inf, nan), (inf, nan), (inf, nan)]
KelleyRyanM commented 7 years ago

Can you double check that you've synced the latest from develop? After the merge, those two branches should be mostly identical, although there is the possibility the merge didn't go correctly. At least in my local copy, I don't see the exceptions with develop.

ekarlins commented 7 years ago

It looks like there is nothing new in this branch compared to my version:

git checkout remotes/upstream/develop
Previous HEAD position was f42ddae... Standardize on numpy for float and ushort, built-in for other
HEAD is now at 8416d95... Added note on creation of GTC files
[karlinser@cgemsIII BeadArrayFiles]$ git pull upstream develop
From https://github.com/Illumina/BeadArrayFiles
 * branch            develop    -> FETCH_HEAD
Already up-to-date.

Is there something else I should check?

Here's the traceback from that error in the "develop" branch:

>>> norm = gtc.get_normalized_intensities(manifest.normalization_lookups)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "IlluminaBeadArrayFiles.py", line 457, in get_normalized_intensities
    return [normalization_transforms[lookup].normalize_intensities(x_raw, y_raw) for (x_raw, y_raw, lookup) in zip(self.get_raw_x_intensities(), self.get_raw_y_intensities(), normalization_lookups)]  
  File "IlluminaBeadArrayFiles.py", line 517, in normalize_intensities
    xn = tempx3 / self.scale_x
ZeroDivisionError: float division by zero
ekarlins commented 7 years ago

Sorry, you're right. Yesterday I was using the branch from my original fork:

git checkout origin/develop

When I use the remote branch it works the same in develop as in zero-division-normalization.

git checkout remotes/upstream/develop

Is what I posted yesterday the expected result? Do you expect inf as a possible value?

KelleyRyanM commented 7 years ago

Yes, in that case Inf is the expected result. This should be consistent with what you see if the same sample is loaded into GenomeStudio.

ekarlins commented 7 years ago

Great! So on my end this issue seems fixed. Thanks for working on this!