acorg / Racmacs

Racmacs R package for performing antigenic cartography
https://acorg.github.io/Racmacs/
GNU Affero General Public License v3.0
20 stars 9 forks source link

Multiple unconnected maps can be made using make.acmap #93

Closed drserajames closed 2 years ago

drserajames commented 2 years ago

You have a table with two disconnected sets of antigens/sera. Then when you make.acmap, you get a warning about variation, but not one that specifically flags the real issue.

e.g.

> library(Racmacs)
> set.seed(850909)
> dat <- matrix(10*2^round(10*runif(100)), ncol=10)
> dat[6:10,1:5] <- "*"
> dat[1:5,6:10] <- "*" 
> dat
      [,1]   [,2]   [,3]   [,4]  [,5]   [,6]  [,7]    [,8]   [,9]   [,10] 
 [1,] "40"   "320"  "640"  "320" "80"   "*"   "*"     "*"    "*"    "*"   
 [2,] "10"   "320"  "80"   "160" "640"  "*"   "*"     "*"    "*"    "*"   
 [3,] "1280" "2560" "5120" "320" "5120" "*"   "*"     "*"    "*"    "*"   
 [4,] "20"   "5120" "160"  "40"  "160"  "*"   "*"     "*"    "*"    "*"   
 [5,] "1280" "5120" "5120" "80"  "2560" "*"   "*"     "*"    "*"    "*"   
 [6,] "*"    "*"    "*"    "*"   "*"    "40"  "20"    "2560" "20"   "160" 
 [7,] "*"    "*"    "*"    "*"   "*"    "10"  "640"   "1280" "80"   "1280"
 [8,] "*"    "*"    "*"    "*"   "*"    "80"  "10240" "160"  "20"   "10"  
 [9,] "*"    "*"    "*"    "*"   "*"    "320" "160"   "160"  "1280" "160" 
[10,] "*"    "*"    "*"    "*"   "*"    "80"  "5120"  "80"   "1280" "160" 
> map <- make.acmap(dat)
Performing 100 optimizations
================================================================================
Optimization runs complete
Took 0.15 secs

Warning message:
In optimizeMap(map = map, number_of_dimensions = number_of_dimensions,  :
  There is some variation (4.56 AU for one point) in the top runs, this may be an indication that more optimization runs could help achieve a better optimum. Also consider running it with options = list(dim_annealing = TRUE).

Obviously a user shouldn't be trying to make a map from disconnected data. My ideal behaviour would be one of the following

drserajames commented 2 years ago

A related issue also occurs when the two maps are weakly connected, e.g by a single serum, so are not coordinated relative to one another.

> set.seed(850909)
> dat <- matrix(10*2^round(10*runif(100)), ncol=10)
> dat[6:10,1:5] <- "*"
> dat[1:5,7:10] <- "*"
> dat
      [,1]   [,2]   [,3]   [,4]  [,5]   [,6]   [,7]    [,8]   [,9]   [,10] 
 [1,] "40"   "320"  "640"  "320" "80"   "160"  "*"     "*"    "*"    "*"   
 [2,] "10"   "320"  "80"   "160" "640"  "5120" "*"     "*"    "*"    "*"   
 [3,] "1280" "2560" "5120" "320" "5120" "20"   "*"     "*"    "*"    "*"   
 [4,] "20"   "5120" "160"  "40"  "160"  "10"   "*"     "*"    "*"    "*"   
 [5,] "1280" "5120" "5120" "80"  "2560" "80"   "*"     "*"    "*"    "*"   
 [6,] "*"    "*"    "*"    "*"   "*"    "40"   "20"    "2560" "20"   "160" 
 [7,] "*"    "*"    "*"    "*"   "*"    "10"   "640"   "1280" "80"   "1280"
 [8,] "*"    "*"    "*"    "*"   "*"    "80"   "10240" "160"  "20"   "10"  
 [9,] "*"    "*"    "*"    "*"   "*"    "320"  "160"   "160"  "1280" "160" 
[10,] "*"    "*"    "*"    "*"   "*"    "80"   "5120"  "80"   "1280" "160" 
> 
> map <- make.acmap(dat)
Performing 100 optimizations
================================================================================
Optimization runs complete
Took 0.11 secs

Warning message:
In optimizeMap(map = map, number_of_dimensions = number_of_dimensions,  :
  There is some variation (2.19 AU for one point) in the top runs, this may be an indication that more optimization runs could help achieve a better optimum. Also consider running it with options = list(dim_annealing = TRUE).
> view(map)
shwilks commented 2 years ago

Thanks, yes I agree this should give a warning or error. Any ideas how to test for this programatically? I feel like it would maybe have to do with checking e.g. how many overlapping sera titrations there are between pairs of antigens, but it’s not immediately obvious where to go from there?

drserajames commented 2 years ago

I don't have the answer, but I think network theory does.

Consider the detectable titres as edges of an undirected graph. For example, this function calculates the path between vertices, returning Inf if they are unconnected. Analysis of this matrix should reveal if there are unconnected maps.

The issue of poorly connected maps is more complicated. Consider two robustly connected and coordinated maps that are more loosely connected to each other. If all paths from one map to the other go through a single vertex, then unconstrained rotation can occur around this point; the calculated positions of one map relative to the other will be uninformative. The same for two vertices. I think three connecting vertices are the minimum for a single solution in 3d (like with a smaller map). So programmatically, can you disconnect two vertices by removing 3 or fewer vertices - calculate using e.g. this function. I'm pretty sure this effect causes some of the issues with surveillance maps so I will probably have to return to it.

I've ignored the undetectable titres, which have the potential to stabilise a particular map configuration, because it's too complicated.

A temporary fix would be to change the warning to say that it could be more/better optimisations are needed or that the map cannot be well optimised (which could be due to poor coordination of subgroups of points, or could be something else).

shwilks commented 2 years ago

Neat! Yes, that second function seems like exactly what we would need, e.g. check if we have any vertex connectivity < ndimensions + 1 (otherwise with ndimensions + 0 you would still have unconstrained reflections I think..?).

shwilks commented 2 years ago

This is now somewhat addressed in the latest version (1.1.30), it'll give a warning but not an error if maps are disconnected (perhaps you're right this should be a straight error - you can use the currently internal function Racmacs:::mapDisconnected() to check this yourself though before optimizing). If maps fail to optimize consistently it also suggests that cohesion of points is one of the possible problems and points you to the functions mapCohesion() agCohesion() and srCohesion() to help diagnose this. It doesn't check cohesion as a routine part of optimization since it's quite a computationally expensive check with larger maps.

drserajames commented 2 years ago

That's great!

I think more users will ignore warnings and be confused by the disconnected maps. I don't see an obvious use for a map output that is a superposition of two maps. But I assume you've kept it functional for some reason. And I have the ability to make my own checks so it's not a problem for me.

shwilks commented 2 years ago

I agree, in v1.1.31 this is now upgraded from a warning to an error, but with an option to ignore if you really want to do it anyway, e.g. an Omicron strain with all <10 values