amnh / PCG

𝙋𝙝𝙮𝙡𝙤𝙜𝙚𝙣𝙚𝙩𝙞𝙘 𝘾𝙤𝙢𝙥𝙤𝙣𝙚𝙣𝙩 𝙂𝙧𝙖𝙥𝙝 ⸺ Haskell program and libraries for general phylogenetic graph search
28 stars 1 forks source link

Index error defect related to clustering #153

Closed Boarders closed 4 years ago

Boarders commented 5 years ago

When running the following script on the chel data set:

read("chel.fasta")
read("chel_nj_tbr_rr.tre")
build(1, tree, cluster:(upgma, group:5))
build(1, network)
report(data, ("output3-new.data", overwrite))

I get the error:

pcg: Could not index: (6,25)
CallStack (from HasCallStack):
  error, called at pcg-data-structures/src/Bio/Graph/PhylogeneticDAG/DynamicCharacterRerooting.hs:697:25 in pcg-core-0.2.0-TLFuH2H0RzLIPIEdkBwcY-pcg-data-structures:Bio.Graph.PhylogeneticDAG.DynamicCharacterRerooting

Running either build command without the other build command seems to work as expected.

recursion-ninja commented 4 years ago

This issue appears to be that the output graph from the clustering function(s) is malformed.

This is the adjacency matrix for the graph that was produced from the clustering method, after the build(1, tree, cluster:(upgma, group:5)) command but before the build(1, network) command:

node parent(s) children
  0  {6}  {1,2}  
  1  {0}  {}     
  2  {0}  {}     
  3  {4}  {}     
  4  {25} {3,5}  
  5  {4}  {}     
  6  {8}  {0,7}  
  7  {6}  {}     
  8  {10} {6,9}  
  9  {8}  {}     
  10 {12} {8,11} 
  11 {10} {}     
  12 {14} {10,13}
  13 {12} {}     
  14 {16} {12,15}
  15 {14} {}     
  16 {20} {14,18}
  17 {18} {}     
  18 {16} {17,19}
  19 {18} {}     
  20 {25} {16,21}
  21 {20} {}     
  22 {29} {}     
  23 {29} {}     
  24 {}   {25,28}
  25 {24} {26,29}
  26 {25} {4,20} 
  27 {29} {}     
  28 {24} {}     
  29 {25} {27,31}
  30 {31} {22,23}
  31 {29} {30,32}
  32 {31} {}

Additionally, here is a nice (but deceptive) rendering of the tree:

                             24                             
                             |                              
                            ------------------------------  
                           /                              \ 
                           25                             28
                           |                                
                    ----------------------------            
                   /                            \           
                   26                           29          
                   |                            |           
  --------------------                     -------          
 /                    \                   /       \         
 4                    20                  27      31        
 |                    |                           |         
 --                  -----------------           -----      
/  \                /                 \         /     \     
3  5                16                21        30    32    
                    |                           |           
                 ---------------               ---          
                /               \             /   \         
                14              18            22  23        
                |               |                           
               -----------     ---                          
              /           \   /   \                         
              12          15  17  19                        
              |                                             
             ---------                                      
            /         \                                     
            10        13                                    
            |                                               
           -------                                          
          /       \                                         
          8       11                                        
          |                                                 
          -----                                             
         /     \                                            
         6     9                                            
         |                                                  
        ----                                                
       /    \                                               
       0    7                                               
       |                                                    
       --                                                   
      /  \                                                  
      1  2

As you can see from the adjacency matrix, nodes 22 and 23 are leaf nodes which both list that node 29 is their parent. However node 29 lists nodes 27 and 31 as it's children. The referential integrity of the graph object is broken at this point, and the later attempt to perform re-rooting logic in the subsequent build command fails because it assumes referential integrity as an invariant that is maintained.

This issue seems to be that nodes 22 and 23 did not have their parents set correctly somewhere in the clustering code.

I created this branch for diagnosing and correcting the defect: clustering-defect

Boarders commented 4 years ago

Ah interesting thanks for figuring it out. I think I have probably made an index error somewhere along the way. I can have a look at fixing this when I get a moment. I think this must then be different from the pcg-test-flu-net case then as that was using an ordinary wagner build first.

recursion-ninja commented 4 years ago

Since nodes 22 and 23 list node 29 as their parent, but it appears that node 30 is their correct parent, it might be an off by one error.

It probably is a different issue than what is happening in #159.

Boarders commented 4 years ago

That sounds correct to me. I would guess the error is located in the substitution function somewhere as that does a bunch of reasonably complicated fiddling with indices. I'll add this defect to the test suite for that code when I look into it.

Boarders commented 4 years ago

This issue is (hopefully) fixed in this commit: https://github.com/amnh/PCG/commit/e3bea6797e95af26c54ba2c1ba316de5505c8cf2