BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.
GNU General Public License v3.0
47 stars 14 forks source link

Migrate counting relationship types #37

Closed olli0601 closed 5 years ago

olli0601 commented 5 years ago

Hi Matthew

I have been preparing an update of the functions in "multinomial_counts.R". What did I do:

Let s discuss on Wednesday, though it would be good if you could follow the example in 'count.pairwise.relationships' beforehand, and review the classifications in file 'classify.pairwise.relationships'. I don t really understand your ancestry=c("none", "complex"), and how this fits with "sibling"/"intermingled".

Cheers

olli

mdhall272 commented 5 years ago

Fairly simple - "sibling" = "none" (i.e. no ancestry), "intermingled" = "complex"

olli0601 commented 5 years ago

Hi Matthew

The reason I was asking is that you assign "complex" in some pairs when paths12>0 and paths21==0 and adjacent==TRUE. For these pairs, I think you still have evidence of directionality, and thus I assign those pairs to "12".

If you buy into this argument, "intermingled" reduces to those pairs with paths12>0 and paths21>0 and adjacent==TRUE. That s what I did for the Rakai analysis. All pairs who are "intermingled" have ambiguous evidence of direction.

Following the examples code in ?classify.pairwise.relationships, you can then check this with the line dwin %>% filter(paths12>0 & paths21==0 & adjacent)

Happy to discuss, olli

mdhall272 commented 5 years ago

Is allow.mt == TRUE when you get "complex" with paths12>0 and paths21==0? That's what that option should control (if FALSE then "multiAnc" and "multiDesc" are changed to "complex", and this is the default though whether it should be is open to debate). If that occurs when it is TRUE then something is wrong and I'll take a look.

olli0601 commented 5 years ago

I just checked. As far as I can see allow.mt <- TRUE in function "classify.pairwise.relationships" of this branch works as before in your code. Though would be great if you could double check!

olli0601 commented 5 years ago

Hi Matthew

I am currently checking the code, and running into some issues. When I run the following, why is the female declared ancestral to the male?

` require(phyloscannerR) require(tidyverse)

load phyloscanner data

file <-system.file(file.path('extdata','ptyr192_phsc_analyse_trees_output.R'),package='phyloscannerR') load(file) #loads 'phsc', output from 'phyloscanner.analyse.trees'

use distance thresholds found in analysis of Rakai couples

close.threshold <- 0.025 distant.threshold <- 0.05 tip.regex <- "^(.*)_fq[0-9]+read([0-9]+)count([0-9]+)$" min.reads <- 30 min.tips <- 1 ans <- multinomial.calculations(phsc, close.threshold=close.threshold, tip.regex=tip.regex, allow.mt=TRUE, min.reads=min.reads, min.tips=min.tips, distant.threshold = distant.threshold, relationship.types = c('proximity.3.way', 'any.ancestry', 'close.x.contiguous', 'close.and.contiguous', 'close.and.contiguous.and.directed', 'close.and.adjacent.and.directed',
'close.and.contiguous.and.ancestry.cat', 'close.and.adjacent.and.ancestry.cat', 'adjacent.and.proximity.cat'), verbose=TRUE) dwino<- ans$dwin dco<- ans$rplkl str(subset(dwino, host.1=='RkA05193F' & host.2=='RkA06730M' & tree.id=='1025_to_1274')) `

mdhall272 commented 5 years ago

I can't run that code in either branch without errors, but that doesn't surprise me as a classification in that tree. There will be one RkA06730M subgraph descended from what looks like a RkA05193F subgraph. (RkA05210M makes the situation questionable, but the classification seems right.)

olli0601 commented 5 years ago

Hi Matthew

What is the error in running the above? I am in the migrate_counts branch, then source the multinomial code in the "deprecated" folder of that branch manually.

What I am confused about in the above window is that we have paths12==0 & paths21>0 and ancestry=='anc'. Why would that be?

Thanks.

screen shot 2018-12-03 at 15 39 42
mdhall272 commented 5 years ago

The error:

Calculating k.eff and n.eff for windows (n=0)...
Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'k' not found.
In addition: There were 50 or more warnings (use warnings() to see the first 50)

The problem with the table was that I neglected to change the order of all the columns when ensuring that host.1 and host.2 are in alphabetical order, which I've now fixed on the main branch. Sorry. The paths columns weren't subsequently used before.

However, I think in the long run (although not now if it means you would have to rerun anything) I'd prefer to change it so the classify function knows about the variations and calls "anc"/"desc" based on the different particular variation asked for, which will allow me to pull the allow.mt option out of merge.classifications as well. Feels cleaner. Is this OK? Then in your case you just need to pay attention to "anc"/"desc" not the path counts.

E.g

classify <- function(ptree, allow.mt = F, strict.ancestry = T, verbose = F, no.progress.bars = F) {...

where strict.ancestry is your option. If that is FALSE then an "anc" will obey your rules.

olli0601 commented 5 years ago

Hi Matthew

Thanks for the fix. I tested the new branch on Rakai data, and it now returns identical results.

Some changes:

Would you be able to edit your analyse_trees.Rscript so that it saves dwin and dc of the following lines to one or two (as you prefer) rda files as below. When done, I can then re-run on the entire RCCS data.

close.threshold <- 0.025
distant.threshold   <- 0.05 
relationship.types  <- c('proximity.3.way',
            'any.ancestry',
            'close.x.contiguous',           
            'close.and.adjacent',                   
            'close.and.adjacent.and.directed',                  
            'close.and.adjacent.and.ancestry',          
            'close.and.contiguous',                 
            'close.and.contiguous.and.directed',                    
            'close.and.contiguous.and.ancestry')    
use.paths.to.define.relationships <- FALSE
tip.regex <- "^(.*)_fq[0-9]+_read_([0-9]+)_count_([0-9]+)$"
min.reads <- 30
min.tips <- 1
verbose <- TRUE

dwin <- classify.pairwise.relationships(phsc, 
  allow.mt=allow.mt, 
  close.threshold=close.threshold, 
  distant.threshold=distant.threshold,
  use.paths.to.define.relationships=use.paths.to.define.relationships,
  relationship.types=relationship.types, 
  verbose=TRUE)         
tmp <- select.windows.by.read.and.tip.count(phsc, dwin, tip.regex, min.reads, min.tips)
dc <- count.pairwise.relationships(tmp)

Thank you! olli

mdhall272 commented 5 years ago

There is already an option to save the entire workspace as an RDA, so I'd prefer to just add an option to perform these calculations as part of phyloscanner_analyse_trees.R. Then dwin and dc would be available along with everything else if --rda is specified.