SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

FilterBiasedEdges of kinc.R breaks if not enough observations #154

Open JohnHadish opened 4 years ago

JohnHadish commented 4 years ago

This appears to be an issue with too few samples in the out cluster. I am going to redo this run but with output that can identify the edge which it failed on.

 __  __   ______   __  __  ____       ____        
/\ \/\ \ /\__  _\ /\ \/\ \/\  _`\    /\  _`\      
\ \ \/'/'\/_/\ \/ \ \ `\\ \ \ \/\_\  \ \ \L\ \    
 \ \ , <    \ \ \  \ \ , ` \ \ \/_/_  \ \ ,  /    
  \ \ \\`\   \_\ \__\ \ \`\ \ \ \L\ \__\ \ \\ \   
   \ \_\ \_\ /\_____\\ \_\ \_\ \____/\_\\ \_\ \_\ 
    \/_/\/_/ \/_____/ \/_/\/_/\/___/\/_/ \/_/\/ / 

This script uses KINC.R, a companion R library for KINC
https://github.com/SystemsGenetics/KINC.R
-------------------------------------------------------
Loading the expression matrix file...
Filtering the network for biased edges...
  Num threads: max allowed - 2
  GCE Welch's Anova test threshold: 0.001
  Missigness T-test threshold: 0.1
  Output file prefix: GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.th0.0-p1e-06-rsqr0.30-gcn.tidy_all
  Network Size: 51306542
  Chunk Size: 1e+06
  Number of chunks: 52

Working on chunk: 1. Edges 1 to 999999
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Working on chunk: 2. Edges 1e+06 to 1999999
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Working on chunk: 3. Edges 2e+06 to 2999999
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Working on chunk: 4. Edges 3e+06 to 3999999
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Working on chunk: 5. Edges 4e+06 to 4999999
  |++++++++++++++++++++++++++++                      |  56%Error in checkForRemoteErrors(val) : 
  one node produced an error: not enough observations
Calls: filterBiasedEdges ... clusterApply -> staticClusterApply -> checkForRemoteErrors

Execution halted
JohnHadish commented 4 years ago

Not sure why it is happening because this code from performBiasTests should catch any edges where this is the case:

  # Only perform the test if we have at least 10 samples in and
  # out of the cluster.
  if (length(non_cluster_samples) - length(missing_samples) >= 10 &
      length(cluster_samples) >= 10) {
JohnHadish commented 4 years ago

I have located the edge which is causing the issue:

Edge:

pycom06g10210   pycom05g21060   0.85556501      co      1       26      0011100000119100000010101000001011000009101100080191190000010111000001011100000 Tissue__Cortex  1.5652772e-11   nan

GEM for these genes:

pycom06g10210   11.4470832262097    11.6013065261218    12.062720767165 11.9038818457362    11.610563503925 11.1019756709492    10.9787104591064    11.4486325677306    11.1730524577741    11.3826240265749    11.891403996665 11.2149261879424    10.7431513941125    11.2597432636908    11.9307373375629    11.1535520317081    11.3179778224071    11.502334580198 11.2155329997457    11.2437690319619    11.7706638929168    12.0330788061306    11.5415806599912    11.9248125036058    12.0017600283272    10.9314762338868    10.9106427304696    11.2644426002266    11.5961897561444    11.0728025345442    12.2926092227643    11.7842258604263    11.2986354067342    11.9248125036058    11.2348174311173    11.8872206154684    11.4252159032994    11.856814552503 11.2969162068793    12.0320457269308    11.7494504655699    11.6429542234803    11.8776675740679    12.0484868739923    11.9549232920303    11.714674827308 11.6054795180617    10.642051692928 12.2626821423622    11.5294305541462    11.0084286220706    11.4696418172395    11.3944626946103    11.4252159032994    11.880348808156 11.523561956057 11.6564248632778    11.6794800995054    11.8447057644119    11.9516489859046    11.8649599151426    12.1497471195047    12.2224930522465    12.362765744154 11.9336906549522    11.8141818841903    11.8610869059954    11.8757493514201    11.6501542136751    12.0105281058865    12.0467829703564    11.5769566647061    12.0059753565004    11.8411711892485    11.4731983836775    11.7515440590891    11.6252522217469    11.9151324489507    11.6142497276894
pycom05g21060   2.32192809488736    1.58496250072116    2   2.32192809488736    1   6.97727992349992    6.91886323727459    7.53138146051631    7.03342300153745    6.8073549220576 1   0   -Inf    0   0   6.56985560833095    6.55458885167764    6.89481776330794    6.68650052718322    6.39231742277876    1.58496250072116    0   00  1   4.16992500144231    3.8073549220576 3.90689059560852    4.24792751344359    3.90689059560852    1.58496250072116    1   0   2   3.70043971814109    4.https://kinc.readthedocs.io/en/latest/16992500144231  4.52356195605701    4.32192809488736    4.16992500144231    -Inf    1   2   1   1.58496250072116    4.32192809488736    3.8073549220576 4   4.08746284125034    4.85798099512757    0   -Inf    0   1   -Inf    4.32192809488736    4.24792751344359    4.75488750216347    3.8073549220576 3.90689059560852    1   0   2.32192809488736    2.32192809488736    2.8073549220576 3.70043971814109    4.24792751344359    4.8073549220576 4.24792751344359    4.16992500144231    2.32192809488736    0   0   21  4.24792751344359    3.70043971814109    4.39231742277876    4.16992500144231    3.70043971814109

What ever edge immediatly follows this edge will throw the error. I am not sure why this is, but it prevents the program from running.

Example of "Network" (2 edges) which will fail:

Source  Target  Similarity_Score    Interaction Cluster_Index   Cluster_Size    Samples Test_Name   p_value r_squared
pycom06g10210   pycom05g21060   0.85556501      co      1       26      0011100000119100000010101000001011000009101100080191190000010111000001011100000 Tissue__Cortex  1.5652772e-11   nan
pycom06g10210   pycom05g21270   0.4647364   co  1   79  1111111111111111111111111111111111111111111111111111111111111111111111111111111 Ordinal 2.2092328e-12   0.5320459
JohnHadish commented 4 years ago

Here are 2 more edges which broke this: Edge:

pycom17g10880   pycom01g00060   -0.88779348 co  1   17  0000000007000000070001108080000111008000911100080101910000011101000000001100000 Tissue__Cortex  6.3837683e-07   nan

GEM:

pycom17g10880   4.75488750216347    4   4.75488750216347    4.90689059560852    4   7.29462074889163    6.58496250072116    7.15987133677839    7.15987133677839    7.10852445677817    4.4594316186373 2.8073549220576 3.16992500144231    3.58496250072116    3.8073549220576 6.32192809488736    6   6.79441586635011    6.12928301694497    6.18982455888002    2.32192809488736    1   1.58496250072116    1   0   3.90689059560852    3.8073549220576 3.90689059560852    4.70043971814109    4.4594316186373 2   2.32192809488736    1.58496250072116    1.58496250072116    4.4594316186373 4.70043971814109    4.8073549220576 4.85798099512757    3.8073549220576 2.58496250072116    -Inf    1.58496250072116    1.58496250072116    1   4.85798099512757    4.58496250072116    4.52356195605701    3.70043971814109    5.24792751344359    1   0   1   -Inf    1.58496250072116    5.04439411935845    4.4594316186373 4.75488750216347    4.85798099512757    5.04439411935845    1.58496250072116    1.58496250072116    1   2.32192809488736    1   4.64385618977472    4.70043971814109    4.64385618977472    4.64385618977472    4.95419631038687    2   1.58496250072116    2.8073549220576 1.58496250072116    1   5   4.52356195605701    4.16992500144231    4.32192809488736    5.28540221886225
pycom01g00060   7.32192809488736    7.21916852046216    7.94251450533924    7.83289001416474    7.69348695749933    8.4178525148859 7.93073733756289    8.49984588708321    8.36632221424582    8.8073549220576 8.40514146313634    7.27612440527424    6.82017896241519    7.2667865406949 7.66533591718518    8.16992500144231    8.18487534290828    8.57742882803575    8.11374216604919    8.07681559705083    6.4594316186373 6.7279204545632 6.4757334309664 6.16992500144231    7.21916852046216    6.4594316186373 6   6.52356195605701    7.43462822763672    7.34872815423108    6.6724253419715 5.85798099512757    6.18982455888002    6.2667865406949 6.12928301694497    6.98868468677217    7.25738784269265    7.4757334309664 6.94251450533924    6.4093909361377 6.7279204545632 6.2667865406949 6.2667865406949 6.93073733756289    6.76818432477693    6.58496250072116    6.71424551766612    5.39231742277876    7.21916852046216    6.75488750216347    6.02236781302845    6.84549005094437    6.39231742277876    5.95419631038687    6.91886323727459    6.4594316186373 6.8073549220576 6.44294349584873    6.85798099512757    6.02236781302845    6.08746284125034    6.55458885167764    6.55458885167764    6.85798099512757    6.64385618977472    6.64385618977472    6.44294349584873    6.59991284218713    6.2667865406949 6.62935662007961    6.85798099512757    6.53915881110803    6.4594316186373 6.98868468677217    6.71424551766612    6.33985000288462    6.53915881110803    6.5077946401987 6.32192809488736

Edge:

pycom15g16080   pycom14g02680   0.71663702  co  2   28  7707900000007000000011110000001111000001111100000111110000011111000001111100000 Tissue__Cortex  8.86006e-13 nan

GEM:

pycom15g16080   0   1   1.58496250072116    1   -Inf    3.4594316186373 2.58496250072116    3.32192809488736    2.8073549220576 1.58496250072116    3.70043971814109    2   1   2.32192809488736    2   3.58496250072116    3.4594316186373 3.4594316186373 3   1.58496250072116    5.93073733756289    6.16992500144231    4.75488750216347    5.32192809488736    4.64385618977472    5.35755200461808    4.52356195605701    4.4594316186373 4.70043971814109    4.52356195605701    7.68650052718322    7   6.74146698640115    6.98868468677217    5.8073549220576 6.14974711950468    5   6.4093909361377 5.93073733756289    6.59991284218713    6.12928301694497    6.90689059560852    6.58496250072116    6.7279204545632 5.83289001416474    5.4594316186373 5.58496250072116    4.85798099512757    6.44294349584873    7.32192809488736    7.19967234483636    6.98868468677217    6.70043971814109    7.14974711950468    6.78135971352466    6.4594316186373 7.28540221886225    6.08746284125034    6.4757334309664 5.88264304936184    5.88264304936184    5.90689059560852    6.58496250072116    7   5.4262647547021 5.8073549220576 5.70043971814109    5.93073733756289    5.8073549220576 6.68650052718322    6.90689059560852    7.21916852046216    7.60733031374961    7.61470984411521    7.12928301694497    7.06608919045777    6.95419631038687    7.36632221424582    6.55458885167764
pycom14g02680   1   0   1.58496250072116    3.16992500144231    2   6.56985560833095    6.39231742277876    7.08746284125034    6.62935662007961    6.55458885167764    2   1.58496250072116    1   1.58496250072116    2.8073549220576 6.06608919045777    5.61470984411521    6.16992500144231    5.7279204545632 5.75488750216347    1.58496250072116    2   0   12.32192809488736   5.28540221886225    5.61470984411521    5.28540221886225    5.24792751344359    5.16992500144231    2.8073549220576 2   1.58496250072116    1.58496250072116    5.58496250072116    5.64385618977472    5.28540221886225    6.18982455888002    5.7279204545632 1.58496250072116    1.58496250072116    1.58496250072116    1   2   6.39231742277876    6.02236781302845    5.8073549220576 5.12928301694497    7.06608919045777    2.58496250072116    2   1   0   2   6.4262647547021 6.2667865406949 6.61470984411521    6.24792751344359    6.4594316186373 1   0   0   0   2   6.55458885167764    6.58496250072116    6.18982455888002    6.02236781302845    6.35755200461808    0   2.8073549220576 2   1.58496250072116    2.8073549220576 6.44294349584873    6.8703647195834 6.75488750216347    6.74146698640115    6.62935662007961