FredHutch / gimap

Genetic Interaction MAPping for dual target CRISPR screens
https://fredhutch.github.io/gimap/
0 stars 0 forks source link

fix cpm calculations #20

Closed kweav closed 6 months ago

kweav commented 6 months ago

Description of the issue

When going through the code, the log2 cpm values are all 0 and the cpm values are only have 34641 out of 165860 that match/are in line with the original code.

Description of the fix

Using apply(counts, 2, function(x) (x/new_data$counts_per_sample)*1e6) means that every calculation was using the first sample count from new_data$counts_per_sample as the divisor. When comparing this code to the original, use of new_data$counts_per_sample leads to only 34641 of the 165850 values matched between the two repos. Replacing the pre-calculated sums with sum(x) brought them into alignment.

For the log2 cpm calculation log2(new_data$cpm +1) leads to all zeros because there is no data in new_data$cpm. Replacing it with new_data$transformed_data$cpm points to the correct information and fixes this problem.

These changes were originally made on my qc branch in early Feb and buried under the rest of the changes, so suggesting them here separately so they can be incorporated now.

Type of change

How Has This Been Tested?

Tested locally by running the original repo's code side by side with the new code and comparing outputs