R-Computing-Lab / BGmisc

Functions for extended pedigree analysis
https://r-computing-lab.github.io/BGmisc/
GNU General Public License v3.0
1 stars 3 forks source link

Network-based Data Validation Checks #27

Open smasongarrison opened 4 months ago

smasongarrison commented 4 months ago

Additional data checks we can implement include,

mhunter1 commented 4 months ago

checking if child has more than 2 parents in their network

Find in-degree and out-degree of a graph: igraph::degree(..., mode='in') In-degree should be at most two. Out-degree is the number of children born to an individual.

checking if there are cycles in the data (aka is a parent's child the parent's parent? aka is it recurvise)

Detect cycles: igraph::is_dag() function? These are cases when e.g. a parent is the child of a child. Maybe igraph::is_acyclic() would be better? Either of these returns a logical TRUE/FALSE. A subsequent step is to find the paths that are creating the cycles. See igraph::feedback_arc_set() for finding the edges that -- if pruned -- would kill the cycles.

Duplicate edges could be found with igraph::which_multiple().

The igraph::topo_sort() function might be a quick alternative for generation number. See #28 .

smasongarrison commented 4 months ago

Well, I found us a cycle in the mitochondrial line .... in the Families of England data set.

image