Division issue - Githubissues

davidvi / pypanda

Python implementation of PANDA (Passing Attributes between Networks for Data Assimilation)

38 stars 11 forks source link

Hi! PANDA starts by generating a gene co-expression (correlation) matrix from the expression data. This can be done on many different data types. We prefer to use normalized counts, but TPMs and log2 counts will work too.

The issue you're having can happen if your input data includes genes that do not show any variation in expression. In principle, YARN should filter out genes that are not expressed across a certain percentage of samples (depending on the thresholds you're using), so that is not likely to happen. (It is still possible that a specific gene has the same non-zero count in all samples, but this is rather unlikely.) However, it may be that you're making your network on a subset of all samples, in which one or more genes are just not expressed.

The easiest option is to filter out these genes before running PANDA. Another workaround is changing the PyPanda code to change correlations that return NA to 0 (this is what we did for the MATLAB code we used to run networks on GTEx data).

davidvi / pypanda

Division issue #2