Dimension testing absolute error variability greater than expected from sd

drserajames commented 5 years ago

I had originally submitted this to Racmacs, but I realised that the dimensionTestMap function uses acmacs.map_resolution_test

I'm not sure if this is an error or if I'm interpreting the data incorrectly. There is more variability in the average absolute error than I'd expect from it's sd and n.

I used the sd and n to calculate a 95% CI = mean +/- 1.96 * sd / sqrt(n) . I then repeat dimension testing and compare the results. With 5 dimensions and 10 repeats (50 data points), I'd expect 2.5 to be outside the CIs. Most are outside this range.

Example code below

library(Racmacs)

# generate random test data
set.seed(1)
coord <- matrix(rep(runif(10,0,10), times=2), ncol=2, byrow=T)
dist <- as.matrix(dist(coord))+rnorm(100)
max_mat <- matrix(apply(round(dist),2,max), ncol=10, nrow=10, byrow=T)
tab1 <- 10*2^round(max_mat-dist)

# make map
map1 <- make.acmap(table=tab1,
                   number_of_dimensions=2,
                   number_of_optimizations=10,
                   minimum_column_basis="2560", 
                   remove_trapped_points="none",
                   check_for_hemisphering = F)

# dimension test
dim_test <- dimensionTestMap(map1, 1:5, 0.1)

# dimension test x10
dim_tests <- NULL
for (i in 1:10){
  dim_tests[[i]] <- dimensionTestMap(map1, 1:5, 0.1)
}

# plot results
plot(NA, xlim=c(1,5), ylim=c(1,2.5), ylab="Average error", xlab="Dimensions")
for (i in 1:10){
  points(dim_tests[[i]]$av_abs_error, pch=16, cex=0.8, col=rainbow(10)[i])
  lines(dim_tests[[i]]$av_abs_error, col=rainbow(10)[i])
}

points(dim_test$av_abs_error, pch=16)
lines(dim_test$av_abs_error)
arrows(1:5, dim_test$av_abs_error-1.96*dim_test$av_abs_error_sd/sqrt(dim_test$number_of_samples), 
       1:5, dim_test$av_abs_error+1.96*dim_test$av_abs_error_sd/sqrt(dim_test$number_of_samples), 
       code=3, length=0.1, angle=90)

The attached plot show the first dimension test (black) with 95% CI and the subsequent 10 repeats (coloured) dimension_results

skepner commented 5 years ago

This is clearly for Racmacs. If you think it's a bug (or misfeature) in acmacs.r please provide the code that uses acmacs.r calls only.

But there is a problem with the map resolution test implementation. I don't understand the algorithm, mostly because I am not good in statistics. What I had to do was reverse engineer a (quite cryptic) lispmds code and tried to do the same in acmacs. It would be perhaps good to re-implement the map resolution test using clearly described algorithm, preferably split into few parts with ability to test each part separately.

There is a strange thing in your code above. dist matrix has few negative values. What is the purpose of generating table using so complicated way? Why not just make a matrix with random titers directly, e.g. matrix(10*2^(sample.int(10,size=100,replace=TRUE)-1), nrow=10, ncol=10)

skepner commented 5 years ago

Also I think map resolution test should be (first) implemented in R, e.g. by Sam, and use acmacs just to relax maps. Then at least we would be confident that statistics is done correctly and you will be able to test your CI assumptions knowing that there are unlikely any mistakes in statistics code inside the test.

drserajames commented 5 years ago

That's a good idea to implement in R first. I will talk to Sam about it.

My odd process for generating random data was to create a distance matrix that represents a 2d map with added noise, rather than completely random data. The noise made some distances negative, so I should have added a line to force those to zero. It's definitely not the most concise way of doing it.

acorg / acmacs.r

Dimension testing absolute error variability greater than expected from sd #6