Open lubianat opened 3 months ago
For all protein coding genes, it is also weird
Results:
for biological processes it runs fine:
ChatGPT comments:
Scenarios for "nan" (not a number):
Division by Zero or Invalid Operations:
In the calculation of odds_ratio, if both the numerator and denominator are zero, it would typically be set to nan because 0000 is undefined. However, the code uses max(1.0 * (n - x) * (N - x), 1) in the denominator to avoid division by zero, but there could still be scenarios where other operations lead to nan.
The np.log10(p_value) operation can result in nan if p_value is zero or negative (logarithms of zero or negative numbers are undefined). This can lead to nan in combined_score.
Scenarios for "inf" (infinity):
Logarithm of a Very Small p-value:
The -np.log10(p_value) component in combined_score can result in inf if p_value is extremely small. In floating-point arithmetic, if p_value is so small that it's effectively zero, np.log10(p_value) could approach negative infinity, and the negative sign in -np.log10(p_value) would make this inf.
Extremely Large odds_ratio:
If the denominator (n - x) * (N - x) in the odds_ratio calculation is very close to zero (but not exactly zero due to the max function), the odds_ratio could become extremely large, possibly leading to inf.
Preventative Measures in Code:
max(..., 1) in the denominator: This avoids division by zero but can still lead to large values if the term in the max function is close to zero.
Handling log of zero or negative values: np.log10(p_value) should be handled carefully, ensuring p_value is never zero or negative, which could be addressed by checking p_value > 0 before computing the logarithm.
Not necessarily a bug. Maybe put the estimate of the p-value to be min of 1e-100
~43k genes from https://www.genenames.org/cgi-bin/download/custom?col=gd_app_sym&status=Approved&hgnc_dbtag=on&order_by=gd_app_sym_sort&format=text&where=(gd_pub_chrom_map%20not%20like%20%27%25patch%25%27%20and%20gd_pub_chrom_map%20not%20like%20%27%25alternate%20reference%20locus%25%27)&submit=submit