arborworkflows / aRbor

aRbor, an R package with useful functions for Arbor workflows
5 stars 3 forks source link

physigArbor error #21

Closed bobthacker closed 9 years ago

bobthacker commented 10 years ago

Zach and I tried with his data and got this:

physigArbor(tree, data$SpongeHost) Error in physigArbor(tree, data$SpongeHost) : could not find function "detectCharacterType"

Where can we find "detectCharacterType"?

lukejharmon commented 9 years ago

Bob can you try this again with the current version of aRbor? I bet it works now - and if so we can close this.

bobthacker commented 9 years ago

Umm which one? Is this in the arbor directory? Is this one discrete or continuous? I need more information

Sent from my iPhone.

On Sep 17, 2014, at 11:24 AM, Luke Harmon notifications@github.com wrote:

Bob can you try this again with the current version of aRbor? I bet it works now - and if so we can close this.

— Reply to this email directly or view it on GitHub https://github.com/arborworkflows/aRbor/issues/21#issuecomment-55920034.

bobthacker commented 9 years ago

I tried this again, and got it to work only if I use a .tsv file and not a .csv file for the discrete trait.

Also, on the EasyMode Phylogenetic Signal, the P value is reported as 1 when it should be 0 (well, P<0.0001). Can we show something like <0.0001 if numbers round down to zero?

I think when EasyMode calculates the chi square test statistic, it is putting the transformed model and the original model in the wrong order to make the subtraction?

bobthacker commented 9 years ago

This works:

fitDiscrete for Zach

setwd before starting

library(geiger) hosts<-read.table("cyanohosts.txt", row.names=1, header=TRUE) phy<-read.tree("cyanotreeBT.phy")

check input

head(hosts) combined<-treedata(phy,hosts)

check

combined hosts<-combined$data phy<-combined$phy

check

head(hosts)

make sure the tree is binary

is.binary.tree(phy)

[1] FALSE

phy<-multi2di(phy) is.binary.tree(phy)

[1] TRUE

make the discete variable a vector of factors

spongeHost<-as.factor(hosts[,1])

add row names

names(spongeHost)<-rownames(hosts)

run fitDiscrete using current tree to test hypothesis of tree structure; note this takes a while to run

result1<-fitDiscrete(phy, spongeHost)

view results

result1

transform the tree to lambda=0

phy0<-rescale(phy, "lambda", 0)

run fitDiscrete with transformed tree to test null hypothesis of no tree structure

result0<-fitDiscrete(phy0, spongeHost) result0 lnL0<-result0$opt$lnL cat("Model 0 log-likelihood is", lnL0,"\n") lnL1<-result1$opt$lnL cat("Model 1 log-likelihood is", lnL1,"\n")

compare difference to chi square distribution

prob<-1-pchisq(2_(lnL1-lnL0),1) cat("Significance is", prob, "\nIf zero then really is P < 0.001")

difference<-(lnL1-lnL0)

cat("Difference is", difference,"\n")

critical value

critical<-2_(lnL1-lnL0)

cat("Critcal value is", critical,"\n")

note: 1 df, critical value is 12.6 for P < 0.001

dchisq(0.001,1)

bobthacker commented 9 years ago

I notice that both EasyMode and ArborMode show only AIC. I got the same behavior in each, that is only .tsv works and .csv does not

I got different numbers in R because I used "ER" model; it looks like EasyMode uses BM; I can't find the garbageTest code, so what is it using?

Can we display lnL or give a choice of which metric / significant test to use?

bobthacker commented 9 years ago

Finally, what is the most appropriate? lnL vs. AIC? and ER vs BM?

bobthacker commented 9 years ago

How can I add the data files to this discussion?

lukejharmon commented 9 years ago

The "BM" is because easymode thinks that your character is continuous, because it has so many states! I'll fix that. Garbage test compares the likelihood to a multinomial (no tree) likelihood - it's a "no signal" calculation, like lambda = 0.

lukejharmon commented 9 years ago

BM = never appropriate; ER = fine, sometimes SYM or ARD might be good to try.

lukejharmon commented 9 years ago

AIC and likelihood ratio tests are almost the same in terms of AIC difference > 4 usually means p < 0.05.

lukejharmon commented 9 years ago

And the way to add files is to put them into the archive and link them here, I think.

uyedaj commented 9 years ago

This is mostly fixed now. There is some room for improvement in how we do this, it seems like on either easy mode or the Arbor web interface we should allow the user to override the character type if necessary.

bobthacker commented 9 years ago

OK, this worked, but the table at the bottom is not very user-friendly in terms of understanding what it means. So, I think we just need to tweak the output a little. Like maybe to have in Column A: Log-likelihood of null model (Garbage), in Column B: value next row: Log-likelihood of input data (Mk model #but write that out better than Mk), value next row: Difference in log-likehoods, value next row: P based on chi-square, value next row AIC of null next row AIC of input next row Diff next row P

I think having a more user-friendly output table is critical to getting students to use Easy Mode to learn

On Thu, Oct 2, 2014 at 1:33 PM, Josef Uyeda notifications@github.com wrote:

Closed #21 https://github.com/arborworkflows/aRbor/issues/21.

— Reply to this email directly or view it on GitHub https://github.com/arborworkflows/aRbor/issues/21#event-173418303.

Robert W. Thacker, PhD Professor Department of Biology University of Alabama at Birmingham 464 Campbell Hall 1300 University Boulevard Birmingham, AL 35294-1170 voice: 205-934-9685 fax: 205-975-6097 email: thacker@uab.edu http://www.uab.edu/cas/biology/thacker http://www.uab.edu/biology/thacker http://www.portol.org