Nan value in PIDC results

WWXkenmo commented 3 years ago

Hi Beeline team, I am currently using your neat pipeline, while I have encountered a very wird typo in the rankedEdges.csv file of PIDC results. It seems that on my datasets, the edge weights measured by PIDC is all nan values, like this

But! after I used a search algorithm developed by myself ( this algorithm need to repeat run PIDC, which could not be applied on the large-scale scRNA-seq datasets), I found that just delete some of the genes (in my cases, the 441th,865th,866th genes), the edge weights are back to normal ??

I originally thought that may be these genes have some bad statistical characteristics, but regretly that I didn't find any special properties of those genes. (e.g. average expression, variance, coefficients of variation, etc...) I found this thing is happened in most of my datasets, so I think its really important to be figured out, but I have no idea about how to solve it.

In order to let your team to check this typo, I have create a repo and upload the ExpressionData.csv, https://github.com/WWXkenmo/PIDC_bug

Best, Ken

ktakers commented 3 years ago

Thank you for using BEELINE. I was able to reproduce the NaN error in the PIDC output using your example ExpressionData.txt. In the PIDC output I see the following error message for NaN edges:

Gamma distribution failed for Rps3 and Srgn; used normal instead.

I haven't root caused the error and will continue looking into this.

ktakers commented 3 years ago

I haven't found any issues in the way that BEELINE prepares the input or parses the output from PIDC. I believe the error is related to a poor fit of the input to gamma or normal distributions, but I haven't identified how this results in NaN values in the output from PIDC. I recommend following up with the maintainers of PIDC at https://github.com/Tchanders/NetworkInference.jl for further root causing.

Murali-group / Beeline

Nan value in PIDC results #56