JuliaPhylo / QuartetNetworkGoodnessFit.jl

julia package for phylogenetic networks analyses using 4-taxon subsets.
Other
9 stars 1 forks source link

Different p-value on same tree with slightly different branch lengths #18

Open crsl4 opened 2 years ago

crsl4 commented 2 years ago

I am adding commands to run TICR in julia in the PhyloNetworks wiki. I get two very different p-values for the ASTRAL tree and the SNaQ h=0 tree, but the tree topologies are the same and the difference in branch lengths (at least visually on the plot) look very similar.

I followed this code inside the data_results/n15.gamma0.30.20.2_n300 folder from the PhyloNetworks wiki:

using QuartetNetworkGoodnessFit, CSV, PhyloPlots, DataFrames, PhyloNetworks
df = DataFrame(CSV.File("bucky-output/1_seqgen.CFs.csv"))
dat = df[:,[1,2,3,4,5,8,11]]

astraltree = readMultiTopology("astral/astral.tre")[102] # main tree with BS as node labels
rootatnode!(astraltree,"15")
plot(astraltree,:R, showEdgeLength=true)
out = ticr!(astraltree,dat,false)

P-value for ASTRAL tree:

julia> out[1]
2.5095827563325576e-6

Then, same code for SNaQ h=0 tree:

snaqtree = readTopology("snaq/net0_bucky.out")
rootatnode!(snaqtree,"15")
plot(snaqtree,:R, showEdgeLength=true)
out2 = ticr!(snaqtree,dat,false)

P-value for SNaQ h=0 tree:

julia> out2[1]
0.8984817899839543

Even if I optimize branch lengths on the SNaQ h=0 tree:

julia> out2 = ticr!(snaqtree,dat,true)
(0.8984817899839543, -1.2729482105384087, Dict("[0.05, 0.1)" => 50, "[0.0, 0.01)" => 30, "[0.01, 0.05)" => 28, "[0.1, 1.0]" => 1257), [74.34019547885967, 7601.3467641088855], [0.30195601360220753, 0.4748614258436003, 0.7787108654400592, 0.5303203132024022, 0.7787108654400592, 0.7321699351036286, 0.7787108654400592, 0.9253543201911679, 0.9983388360397745, 0.4748614258436003  …  0.946903443206478, 0.9757944592517083, 0.3743756117951704, 0.12274795560100286, 0.9888728409315868, 0.9692954871424657, 0.9250612479969377, 0.4410937049467882, 0.9255010963721408, 0.10201118577400667], HybridNetwork, Un-rooted Network
27 edges
28 nodes: 15 tips, 0 hybrid nodes, 13 internal tree nodes.
tip labels: 15, 1, 3, 2, ...
(3,2,((6,(4,5):1.102):0.495,(((((12,13):2.766,11):0.716,(7,(10,(8,9):2.047):0.318):0.645):4.504,14):0.897,(1,15):1.256):2.034):2.134);
)

julia> out2[1]
0.8984817899839543
cecileane commented 2 years ago

what happens if you re-build the data frame dat from scratch, before running the second test (with the SNaQ tree)? ticr! modified the data frame, so it would be best to use the same starting point for dat in both cases. I doubt that it will change anything, but who knows.

Did you make sure that both trees don't have any polytomy (with branch lengths of 0 if need be)?

To help the diagnostic, it would help to compare the column created by the test: it contains the outlier p-values corresponding to each four-taxon set. The largest differences between the astral tree and snaq tree could help find the tree differences that might contributed to the outlier p-value differences.

crsl4 commented 2 years ago

I get the same results if I re-read dat before running the test for the SNaQ tree. I also checked that there are no polytomies on the trees. I will check next the columns created by the test.

cecileane commented 2 years ago

Also: did you check the results using the R implementation in phylolm? Do the result vary a lot from one tree to the other using the R implementation? The polytomies are treated a little differently between the 2 implementations, and you would need to choose the test statistic in julia that is implemented in R.