Mouse-Imaging-Centre / RMINC

Statistics for MINC volumes: A library to integrate voxel-based statistics for MINC volumes into the R environment. Supports getting and writing of MINC volumes, running voxel-wise linear models, correlations, etc.; correcting for multiple comparisons using the False Discovery Rate, and more. With contributions from Jason Lerch, Chris Hammill, Jim Nikelski and Matthijs van Eede. Some additional information can be found here:
https://mouse-imaging-centre.github.io/RMINC
Other
22 stars 17 forks source link

Using the hierarchy in anatGetAll? #135

Closed gdevenyi closed 7 years ago

gdevenyi commented 7 years ago

With the more recent DSU and DSUR atlases there are many regions segmented which aren't really distinguishable (or interesting) on a given in-vivio scan.

The atlas CSV files define a hierarchy which combine other labels, but I can't find a place for RMINC to use it.

Is this feature available, or is there a script available to take advantage of this? Thanks!

cfhammill commented 7 years ago

To my knowledge this isn't implemented yet but perhaps @jasonlerch knows.

If it isn't implemented yet it is certainly do-able. I'm the label, jacobian, and sum cases you just need to add the rows together according to the hierarchy. In the mean case it's a bit trickier because you need to keep track of the number of voxels in each label to compute the correct average. This would require changes to the underlying c/c++ code.

gdevenyi commented 7 years ago

For now, the question is restricted to the MAGeT label outputs.

cfhammill commented 7 years ago

I think this could be a simple function, I'll try to find time to write it next week.

cfhammill commented 7 years ago

I sketched a function to do this, definitely needs a little tlc, but you could try this if you don't want to wait for the next release. The function can either take a column name and a definitions file or a two column data.frame with columns label and group.

library(dplyr); library(tidyr)

anatSummarize <-
  function(anat
           , summarize_by = "hierarchy"
           , defs = getOption("RMINC_LABEL_DEFINITIONS")
           , discard_missing = FALSE){

    if(is.character(summarize_by) && length(summarize_by == 1)){
      labels <- read.csv(defs, stringsAsFactors = FALSE)
      summarize_by <- 
        labels %>% 
        select_("Structure", summarize_by) %>%
        rename_("label" = "Structure", "group" = summarize_by)
    }

    if(!discard_missing)
      summarize_by <-
        summarize_by %>%
        mutate_(group = ~ifelse(is.na(group) | group == "", label, group))

    as.data.frame.matrix(anat) %>%
      (tibble::rownames_to_column) %>%
      gather_("label", "value", setdiff(colnames(anat), "rowname")) %>%
      inner_join(summarize_by, by = "label") %>%
      group_by_("group", "rowname") %>%
      summarize_(value = ~ sum(value)) %>%
      spread_("group", "value") %>%
      as.matrix
  }
gdevenyi commented 7 years ago

Thanks! We'll give it a try.

gdevenyi commented 7 years ago

So, something seems to go off the rails here, it works, then it doesn't:

anat <- anatGetAll(data$label, "/opt/quarantine/resources/DorrSteadmanUllmann/MAGeT/Dorr_2008_Steadman_2013_Ullmann_2013_on_NRXN1a_v1_labels.mnc", method="labels", defs="/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")
data$volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")

We get the first "hierarchy" from the CSV, but after that things go weird in the outputs.

The outputs we got back as a summary:

volCombined.cerebellar vermis   volCombined.cerebellar vermis white matter  volCombined.cerebellar white matter volCombined.cerebral aqueduct   volCombined.fourth ventricle    volCombined.interpedunclar nucleus  volCombined.medulla volCombined.midbrain    volCombined.periaqueductal grey volCombined.pons    volCombined.posterior commissure    volCombined.third ventricle volCombined.ventral tegmental decussation
cfhammill commented 7 years ago

So the issue here is that not all structures are getting returned?

gdevenyi commented 7 years ago

The outputs after the first cerebellar vermis label are from the "structure" column, not the "hierarchy" column.

Missed on copy and paste, first "combined" return is: volCombined.rowname

cfhammill commented 7 years ago

I'd never intended the results to be put as a column in a data frame, I intended something along the lines of

volCombined <- anatSummarize(...)

Which can be coerced to a data frame and and merged/joined to your original data frame if you wish. Rowname is the subject. The discard_missing argument configures whether to keep or discard structures not found in the hierarchy, perhaps setting that to TRUE will fix the problem?

gdevenyi commented 7 years ago

Results: no change vs. saving to a column in a dataframe.

discard_missing = TRUE throws error:

> volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv", discard_missing = TRUE)
Error: All columns must be named
Called from: as_data_frame(data)
cfhammill commented 7 years ago

Hmm, just tried it on my machine and it work fine. Could you call traceback() after the error? I'm not sure where as_data_frame(data) is getting called. Also if you could give me the output of sessionInfo() it could be a dplyr/tidyr versioning thing.

gdevenyi commented 7 years ago
volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv", discard_missing = TRUE)
Error: All columns must be named
Called from: as_data_frame(data)
Browse[1]> traceback()
No traceback available 
Browse[1]> sessionInfo()
R version 3.2.4 (2016-03-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyr_0.4.1         dplyr_0.4.3         ggplot2_2.1.0       RMINC_1.4.3.4      
 [5] BatchJobs_1.6       BBmisc_1.9          lmerTest_2.0-30     lme4_1.1-11        
 [9] Matrix_1.2-4        RevoUtilsMath_3.2.4

loaded via a namespace (and not attached):
 [1] splines_3.2.4       lattice_0.20-33     colorspace_1.2-6    htmltools_0.3.5    
 [5] base64enc_0.1-3     survival_2.38-3     nloptr_1.0.4        foreign_0.8-66     
 [9] DBI_0.3.1           RColorBrewer_1.1-2  plyr_1.8.3          stringr_1.0.0      
[13] munsell_0.4.3       gtable_0.2.0        latticeExtra_0.6-28 httpuv_1.3.3       
[17] parallel_3.2.4      Rcpp_0.12.4         acepack_1.3-3.3     xtable_1.8-2       
[21] backports_1.0.2     scales_0.4.0        checkmate_1.7.3     Hmisc_3.17-2       
[25] mime_0.4            sendmailR_1.2-1     gridExtra_2.2.1     brew_1.0-6         
[29] fail_1.3            digest_0.6.9        stringi_1.0-1       shiny_0.13.2       
[33] grid_3.2.4          tools_3.2.4         magrittr_1.5        lazyeval_0.1.10    
[37] RSQLite_1.0.0       tibble_1.0          Formula_1.2-1       cluster_2.0.3      
[41] MASS_7.3-45         gridBase_0.4-7      assertthat_0.1      minqa_1.2.4        
[45] R6_2.1.2            rpart_4.1-10        nnet_7.3-12         nlme_3.1-126       
[49] compiler_3.2.4
cfhammill commented 7 years ago

Hmm weird there's no backtrace, given that you have the browse prompt you probably hopped into the debugger, maybe this ate the trace.

I'm running tidyr 0.6 and dplyr 0.5, so it is possible it is package version problem

gdevenyi commented 7 years ago

Will check into that, thanks

gdevenyi commented 7 years ago

Updated versions to latest for tidyr and dplyr, no errors anymore, however outputs aren't yet as expected.

Without discard:

"","rowname","cerebellar vermis","cerebellar vermis white matter","cerebellar white matter","cerebral aqueduct","fourth ventricle","interpedunclar nucleus","medulla","midbrain","periaqueductal grey","pons","posterior commissure","third ventricle","ventral tegmental decussation"
"1","1","13.265","1.862","3.755","0.455","0.711","0.196","26.331","11.979","4.788","16.888","0.238","1.563","0.139"
"2","10","13.810","1.943","3.706","0.471","0.649","0.197","26.336","11.494","4.881","17.203","0.232","1.501","0.122"
"3","11","14.193","2.065","3.975","0.391","0.625","0.193","27.574","12.052","4.882","17.589","0.203","1.176","0.165"
"4","12","16.265","2.357","3.855","0.419","0.732","0.246","28.528","12.595","4.661","17.922","0.259","1.683","0.107"
"5","13","15.425","2.457","4.316","0.376","0.672","0.236","30.114","14.284","4.857","18.662","0.208","1.769","0.131"
"6","14","15.303","2.350","4.472","0.392","0.704","0.248","29.364","13.661","4.635","19.202","0.241","1.548","0.131"
"7","15","15.633","2.220","3.749","0.431","0.685","0.203","29.056","12.019","4.614","17.710","0.195","1.505","0.146"
"8","16","15.072","2.206","3.913","0.411","0.693","0.162","29.419","12.779","4.728","18.020","0.178","1.299","0.185"
"9","17","16.080","2.285","3.983","0.476","0.728","0.227","30.049","12.341","4.649","17.990","0.256","1.560","0.127"
"10","18","16.060","2.188","4.164","0.494","0.731","0.235","31.930","13.071","4.990","19.077","0.213","1.616","0.142"
"11","19","16.020","2.277","4.360","0.472","0.726","0.252","32.301","12.825","4.659","19.540","0.245","1.559","0.130"
"12","2","14.547","2.097","3.916","0.450","0.694","0.202","29.066","12.604","4.862","18.010","0.203","1.537","0.135"
"13","20","14.782","2.105","3.945","0.402","0.721","0.207","27.978","12.151","4.724","18.244","0.230","1.603","0.120"
"14","21","14.848","2.208","4.425","0.452","0.737","0.230","30.133","13.264","4.803","19.747","0.253","1.697","0.137"
"15","22","14.240","2.350","4.822","0.389","0.713","0.248","30.322","13.272","4.668","19.484","0.255","1.843","0.121"
"16","23","14.492","2.065","4.025","0.504","0.745","0.226","28.033","12.412","4.760","17.718","0.225","1.577","0.111"
"17","24","14.966","2.223","4.520","0.469","0.733","0.254","30.095","13.523","5.030","19.951","0.221","1.725","0.142"
"18","25","14.454","2.282","4.670","0.472","0.776","0.231","30.048","13.626","4.631","19.350","0.241","1.766","0.140"
"19","26","14.854","2.129","3.914","0.440","0.742","0.228","27.686","12.199","4.623","18.487","0.209","2.199","0.138"
"20","27","15.295","2.250","4.390","0.396","0.791","0.243","30.314","13.346","4.732","20.253","0.275","2.451","0.152"
"21","28","14.766","2.229","4.472","0.392","0.763","0.249","30.135","13.506","4.657","20.467","0.255","2.453","0.134"
"22","3","14.297","2.119","3.985","0.433","0.677","0.216","29.653","13.280","4.527","18.207","0.216","1.557","0.124"
"23","4","13.766","1.980","3.608","0.503","0.690","0.192","26.566","11.736","4.719","16.583","0.232","1.445","0.144"
"24","5","14.197","2.057","3.750","0.481","0.666","0.201","27.415","11.677","4.571","16.804","0.215","1.411","0.143"
"25","6","13.960","1.966","4.119","0.487","0.715","0.224","29.656","12.907","4.730","18.231","0.232","1.470","0.140"
"26","7","13.221","1.930","3.621","0.488","0.698","0.172","25.442","11.265","4.338","16.167","0.198","1.233","0.115"
"27","8","13.383","1.887","3.710","0.456","0.696","0.211","26.598","11.567","4.399","17.177","0.157","1.229","0.128"
"28","9","13.648","2.039","3.998","0.419","0.713","0.215","27.518","12.023","4.233","17.691","0.183","1.232","0.122"

And with:

"","rowname","","cerebellar vermis","cerebellar vermis white matter","cerebellar white matter"
"1","1","63.288","13.265","1.862","3.755"
"2","10","63.086","13.810","1.943","3.706"
"3","11","64.850","14.193","2.065","3.975"
"4","12","67.152","16.265","2.357","3.855"
"5","13","71.309","15.425","2.457","4.316"
"6","14","70.126","15.303","2.350","4.472"
"7","15","66.564","15.633","2.220","3.749"
"8","16","67.874","15.072","2.206","3.913"
"9","17","68.403","16.080","2.285","3.983"
"10","18","72.499","16.060","2.188","4.164"
"11","19","72.709","16.020","2.277","4.360"
"12","2","67.763","14.547","2.097","3.916"
"13","20","66.380","14.782","2.105","3.945"
"14","21","71.453","14.848","2.208","4.425"
"15","22","71.315","14.240","2.350","4.822"
"16","23","66.311","14.492","2.065","4.025"
"17","24","72.143","14.966","2.223","4.520"
"18","25","71.281","14.454","2.282","4.670"
"19","26","66.951","14.854","2.129","3.914"
"20","27","72.953","15.295","2.250","4.390"
"21","28","73.011","14.766","2.229","4.472"
"22","3","68.890","14.297","2.119","3.985"
"23","4","62.810","13.766","1.980","3.608"
"24","5","63.584","14.197","2.057","3.750"
"25","6","68.792","13.960","1.966","4.119"
"26","7","60.116","13.221","1.930","3.621"
"27","8","62.618","13.383","1.887","3.710"
"28","9","64.349","13.648","2.039","3.998"
cfhammill commented 7 years ago

Can you send me an rds file containing your anat? I can debug faster if I have the data.

saveRDS(anat, "anatSummarize_test.rds")

cfhammill commented 7 years ago

Ok I think it's fixed this time, it wasn't handling the laterality of the structures. I'll push a commit to develop exposing anatSummarize this afternoon if you want to give it a try.

gdevenyi commented 7 years ago

can do!

cfhammill commented 7 years ago

Ready to go.

gdevenyi commented 7 years ago

Pushed off to testers!

cfhammill commented 7 years ago

Thanks Gabe, much appreciated.

gdevenyi commented 7 years ago

Still having trouble, different error now:

 volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")
Error: invalid 'type' (character) of argument
In addition: Warning message:
attributes are not identical across measure variables; they will be dropped 
> traceback()
12: stop(list(message = "invalid 'type' (character) of argument", 
        call = NULL, cppstack = NULL))
11: summarise_impl(.data, dots)
10: summarise_.tbl_df(., value = ~sum(value))
9: summarize_(., value = ~sum(value))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(expr, envir, enclos)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: anat %>% as.data.frame.matrix %>% (tibble::rownames_to_column) %>% 
       gather_("label", "value", setdiff(colnames(anat), "rowname")) %>% 
       inner_join(summarize_by, by = "label") %>% group_by_("group", 
       "rowname") %>% summarize_(value = ~sum(value)) %>% spread_("group", 
       "value") %>% arrange_(~as.numeric(rowname)) %>% select_(~-rowname) %>% 
       as.matrix
1: anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")
cfhammill commented 7 years ago

Could you send me your copy of the DSU labels, you can attach it to the issue, it's possible our csv's are a little different and that's causing troubles.

gdevenyi commented 7 years ago

Attached:

Num,Structure,right label,left label,hierarchy,tissue type
1,amygdala,51,151,,GM
2,anterior commissure: pars anterior,115,215,,WM
3,anterior commissure: pars posterior,23,103,,WM
4,basal forebrain,52,152,,GM
5,bed nucleus of stria terminalis,176,76,,GM
6,cerebellar peduncle: inferior,123,223,,WM
7,cerebellar peduncle: middle,45,245,,WM
8,cerebellar peduncle: superior,242,222,,WM
9,cerebral aqueduct,119,119,,CSF
10,cerebral peduncle,114,14,,WM
11,colliculus: inferior,143,43,,GM
12,colliculus: superior,9,109,,GM
13,corpus callosum,8,68,,WM
14,corticospinal tract/pyramids,218,18,,WM
15,cuneate nucleus,166,168,,GM
16,dentate gyrus of hippocampus,16,66,,GM
17,facial nerve (cranial nerve 7),19,219,,GM
18,fasciculus retroflexus,25,125,,WM
19,fimbria,211,11,,
20,fornix,122,22,,WM
21,fourth ventricle,118,118,,CSF
22,fundus of striatum,54,154,,GM
23,globus pallidus,44,144,,GM
24,habenular commissure,99,199,,WM
25,hippocampus,6,106,,GM
26,hypothalamus,250,150,,GM
27,inferior olivary complex,113,213,,GM
28,internal capsule,112,12,,WM
29,interpedunclar nucleus,157,157,,GM
30,lateral olfactory tract,101,102,,WM
31,lateral septum,207,107,,
32,lateral ventricle,57,77,,CSF
33,mammillary bodies,161,61,,GM
34,mammilothalamic tract,210,212,,WM
35,medial lemniscus/medial longitudinal fasciculus,20,120,,WM
36,medial septum,53,153,,
37,medulla,174,174,,
38,midbrain,194,194,,
39,nucleus accumbens,55,155,,GM
40,olfactory bulbs,5,105,,GM
41,olfactory tubercle,95,145,,GM
42,optic tract,216,116,,WM
43,periaqueductal grey,10,10,,GM
44,pons,187,187,,GM
45,pontine nucleus,85,185,,GM
46,posterior commissure,100,100,,WM
47,pre-para subiculum,133,131,,GM
48,stratum granulosum of hippocampus,13,63,,GM
49,stria medullaris,225,205,,GM
50,stria terminalis,59,159,,GM
51,striatum,7,17,,GM
52,subependymale zone / rhinocele,240,140,,GM
53,superior olivary complex,124,214,,GM
54,thalamus,204,4,,GM
55,third ventricle,146,146,,CSF
56,ventral tegmental decussation,156,156,,WM
57,lobules 1-2: lingula and central lobule (ventral),32,32,cerebellar vermis,GM
58,lobule 3: central lobule (dorsal),233,233,cerebellar vermis,GM
59,lobules 4-5: culmen (ventral and dorsal),34,34,cerebellar vermis,GM
60,lobule 6: declive,36,36,cerebellar vermis,GM
61,lobule 7: tuber (or folium),237,237,cerebellar vermis,GM
62,lobule 8: pyramis,38,38,cerebellar vermis,GM
63,lobule 9: uvula,239,239,cerebellar vermis,GM
64,lobule 10: nodulus,40,40,cerebellar vermis,GM
65,anterior lobule  (lobules 4-5),90,148,cerebellar paravermis,GM
66,simple lobule (lobule 6),191,91,cerebellar hemisphere,GM
67,crus 1: ansiform lobule (lobule 6),92,192,cerebellar hemisphere,GM
68,crus 2: ansiform lobule (lobule 7),193,93,cerebellar hemisphere,GM
69,paramedian lobule (lobule 7),94,200,cerebellar hemisphere,GM
70,copula: pyramis (lobule 8),196,96,cerebellar hemisphere,GM
71,flocculus (FL),97,197,cerebellar flocculonodular lobe,GM
72,paraflocculus (PFL),198,98,cerebellar flocculonodular lobe,GM
73,trunk of arbor vita,47,47,cerebellar white matter,WM
74,lobule 1-2 white matter,232,232,cerebellar vermis white matter,WM
75,lobule 3 white matter,33,33,cerebellar vermis white matter,WM
76,trunk of lobules 1-3 white matter,253,253,cerebellar vermis white matter,WM
77,lobules 4-5 white matter,234,234,cerebellar vermis white matter,WM
78,lobules 6-7 white matter,236,236,cerebellar vermis white matter,WM
79,lobule 8 white matter,238,238,cerebellar vermis white matter,WM
80,trunk of lobules 6-8 white matter,254,254,cerebellar vermis white matter,WM
81,lobule 9 white matter,139,139,cerebellar vermis white matter,WM
82,lobule 10 white matter,252,252,cerebellar vermis white matter,WM
83,anterior lobule white matter,21,31,cerebellar paravermis white matter,
84,simple lobule white matter,241,251,cerebellar white matter,WM
85,crus 1 white matter,220,170,cerebellar white matter,WM
86,trunk of simple and crus 1 white matter,226,246,cerebellar white matter,WM
87,crus 2 white matter,229,249,cerebellar white matter,WM
88,paramedian lobule,228,248,cerebellar white matter,WM
89,trunk of crus 2 and paramedian white matter,175,195,cerebellar white matter,WM
90,copula white matter,224,244,cerebellar white matter,WM
91,paraflocculus white matter,183,243,cerebellar flocculonodular lobe white matter,WM
92,flocculus white matter,167,177,cerebellar flocculonodular lobe white matter,WM
93,dentate nucleus,1,201,deep cerebellar nuclei,GM
94,nucleus interpositus,203,3,deep cerebellar nuclei,GM
95,fastigial nucleus,15,206,deep cerebellar nuclei,GM
96,Cingulate cortex: area 24a,24,169,Cingulate region,GM
97,Cingulate cortex: area 24a',26,171,Cingulate region,GM
98,Cingulate cortex: area 24b,27,172,Cingulate region,GM
99,Cingulate cortex: area 24b',28,173,Cingulate region,GM
100,Cingulate cortex: area 25,29,178,Cingulate region,GM
101,Cingulate cortex: area 29a,30,179,Cingulate region,GM
102,Cingulate cortex: area 29b,35,182,Cingulate region,GM
103,Cingulate cortex: area 29c,37,184,Cingulate region,GM
104,Cingulate cortex: area 30,39,186,Cingulate region,GM
105,Cingulate cortex: area 32,41,188,Cingulate region,GM
106,Amygdalopiriform transition area,42,189,Piriform cortex area,GM
107,Primary auditory cortex,46,208,Temporal region,GM
108,Secondary auditory cortex: dorsal area,48,217,Temporal region,GM
109,Secondary auditory cortex: ventral area,49,221,Temporal region,GM
110,Caudomedial entorhinal cortex,50,227,Entorhinal cortex,GM
111,Cingulum ,56,231,Fiber Groups,WM
112,Claustrum,58,235,Insular claustrum,GM
113,Cortex-amygdala transition zones,60,255,Piriform cortex area,GM
114,Claustrum: dorsal part,62,256,Insular claustrum,GM
115,Dorsal nucleus of the endopiriform,65,257,Endopiriform claustrum,GM
116,Dorsal intermediate entorhinal cortex,67,258,Entorhinal cortex,GM
117,Dorsolateral entorhinal cortex,69,259,Entorhinal cortex,GM
118,Dorsolateral orbital cortex,70,260,Frontal region,GM
119,Dorsal tenia tecta,71,261,Entorhinal cortex,GM
120,Ectorhinal cortex,72,262,Insular region,GM
121,Frontal cortex: area 3,73,263,Frontal region,GM
122,Frontal association cortex,74,264,Frontal region,GM
123,Intermediate nucleus of the endopiriform claustrum,75,265,Endopiriform claustrum,GM
124,Insular region: not subdivided,78,266,Insular region,GM
125,Lateral orbital cortex,79,267,Frontal region,GM
126,Lateral parietal association cortex,80,268,Parietal region,GM
127,Primary motor cortex,81,269,Frontal region,GM
128,Secondary motor cortex,82,270,Frontal region,GM
129,Medial entorhinal cortex,83,271,Entorhinal cortex,GM
130,Medial orbital cortex,84,272,Frontal region,GM
131,Medial parietal association cortex,86,273,Parietal region,GM
132,Piriform cortex,87,274,Piriform cortex area,GM
133,Posterolateral cortical amygdaloid area ,88,275,Piriform cortex area,GM
134,Posteromedial cortical amygdaloid area ,89,276,Piriform cortex area,GM
135,Perirhinal cortex,104,277,Insular region,GM
136,Parietal cortex: posterior area: rostral part,108,278,Parietal region,GM
137,Rostral amygdalopiriform area,110,279,Piriform cortex area,GM
138,Primary somatosensory cortex,111,280,Parietal region,GM
139,Primary somatosensory cortex: barrel field,117,281,Parietal region,GM
140,Primary somatosensory cortex: dysgranular zone,121,282,Parietal region,GM
141,Primary somatosensory cortex: forelimb region,126,283,Parietal region,GM
142,Primary somatosensory cortex: hindlimb region,127,284,Parietal region,GM
143,Primary somatosensory cortex: jaw region,128,285,Parietal region,GM
144,Primary somatosensory cortex: shoulder region,129,286,Parietal region,GM
145,Primary somatosensory cortex: trunk region,132,287,Parietal region,GM
146,Primary somatosensory cortex: upper lip region,134,288,Parietal region,GM
147,Secondary somatosensory cortex,135,289,Parietal region,GM
148,Temporal association area,136,290,Temporal region,GM
149,Primary visual cortex,137,291,Occipital region,GM
150,Primary visual cortex: binocular area,138,292,Occipital region,GM
151,Primary visual cortex: monocular area,141,293,Occipital region,GM
152,Secondary visual cortex: lateral area,142,294,Occipital region,GM
153,Secondary visual cortex: mediolateral area,147,295,Occipital region,GM
154,Secondary visual cortex: mediomedial area,149,296,Occipital region,GM
155,Claustrum: ventral part,158,297,Insular claustrum,GM
156,Ventral nucleus of the endopiriform claustrum,160,298,Endopiriform claustrum,GM
157,Ventral intermediate entorhinal cortex,162,299,Entorhinal cortex,GM
158,Ventral orbital cortex,163,300,Frontal region,GM
159,Ventral tenia tecta,165,301,Entorhinal cortex,GM
cfhammill commented 7 years ago

Hmm, with your label set it seems to be working both on our ubuntu 12.04, and centos 5 installs. Both are using R 3.3.2 where it appears you are using R 3.2.4. Did you end up upgrading dplyr and tidyr?

I wonder if something has changed with ~ quotation since 3.2.4, it's complaining that sum has received a character argument, but normal non-standard evaluation should interpolate value to a numeric vector generated by gather_.

To test that this is indeed the problem I made a branch gabe_nse_fix20170512 that uses a different summarization technique. If that solves your problem I'm not precisely sure how to proceed, this version is slower and less clear, but has a longer compatibility window maybe...

gdevenyi commented 7 years ago

I did upgrade dplyr and tidyr.

I'm happy to upgrade our R install to get this working, but I will test with the fix branch as well. Thanks!

gdevenyi commented 7 years ago

Still having issues... git-develop with updated R

volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")
Error in summarise_impl(.data, dots) : 
  invalid 'type' (character) of argument
In addition: Warning message:
attributes are not identical across measure variables; they will be dropped 
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_CA.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RMINC_1.4.3.4        BatchJobs_1.6        BBmisc_1.11          RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9      magrittr_1.5     splines_3.3.3    MASS_7.3-45      gridBase_0.4-7   xtable_1.8-2     lattice_0.20-34  R6_2.2.0         brew_1.0-6       minqa_1.2.4      sendmailR_1.2-1 
[12] stringr_1.2.0    dplyr_0.5.0      tools_3.3.3      fail_1.3         parallel_3.3.3   grid_3.3.3       checkmate_1.8.2  nlme_3.1-131     DBI_0.6          htmltools_0.3.5  lazyeval_0.2.0  
[23] yaml_2.1.14      lme4_1.1-12      digest_0.6.12    assertthat_0.1   tibble_1.2       Matrix_1.2-8     shiny_1.0.0      tidyr_0.6.1      nloptr_1.0.4     base64enc_0.1-3  mime_0.5        
[34] memoise_1.0.0    RSQLite_1.1-2    stringi_1.1.2    RevoUtils_10.0.3 backports_1.0.5  httpuv_1.3.3    
cfhammill commented 7 years ago

Did you try the branch I made for you?

Also can you try running the offending code:

summarize_by <- "hierarchy"
defs <- "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv"

summarize_by <- 
  RMINC:::create_labels_frame(defs) %>%
  select_(~ -label) %>%
  rename_(label = "Structure", group = summarize_by)

summarize_by <-
  summarize_by %>%
  mutate_(group = ~ifelse(is.na(group) | group == "", label, group))

partial <-
  anat %>%
  as.data.frame.matrix %>%
  (tibble::rownames_to_column) %>%
  gather_("label", "value", setdiff(colnames(anat), "rowname")) %>%
  inner_join(summarize_by, by = "label") %>%
  group_by_("group", "rowname") 

## This is the standard evaluation version, should fail
partial %>%
  summarize_(value = ~ sum(value))

## This is the non-standard evaluation version, might work
partial %>%
  summarize(value = sum(value))

## Check that value is numeric
glimpse(partial)
gdevenyi commented 7 years ago

No not yet, this is with updated R (see sessioninfo)

gdevenyi commented 7 years ago

From the fixes branch:

volCombined <- anatSummarize(anat, summarize_by = "hierarchy", defs = "/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")
Error in sum(.$value) : invalid 'type' (character) of argument
In addition: Warning message:
attributes are not identical across measure variables; they will be dropped 
gdevenyi commented 7 years ago

Weird thing I just noticed, anat looks like it has characters instead of numbers:

head(anat)
     right dentate nucleus left nucleus interpositus left thalamus right olfactory bulbs right hippocampus right striatum right corpus callosum right colliculus: superior periaqueductal grey
[1,] " 0.075"              " 0.113"                  " 9.013"      "13.505"              " 9.344"          "10.503"       " 8.029"              " 4.398"                   " 4.788"           
[2,] " 0.079"              " 0.122"                  " 9.661"      "14.289"              " 9.559"          "10.827"       " 8.493"              " 4.571"                   " 4.862"           
[3,] " 0.094"              " 0.119"                  " 9.404"      "14.651"              " 9.623"          "10.767"       " 7.971"              " 4.395"                   " 4.527"           

Notice the spaces inside the number boxes. Maybe this is the bug origin?

gdevenyi commented 7 years ago

rstudio says so too, see upper right

image

cfhammill commented 7 years ago

On my side - reading the rds you sent me anat is a numeric matrix, actually mine is an anatUnilateral which under the hood is a numeric matrix. Yours also has an extra row relative to the one you sent me. Did you happen to write it out and read it in from a csv? And yes, this is definitely why it is not working, it can't sum character vectors.

gdevenyi commented 7 years ago

Indeed that file is.

This is with updated R and rebuilt RMINC based on develop. Clearly something changed :-/

gdevenyi commented 7 years ago

anatGetAll_old works properly.

Looks like a bug in anatGetAll method=labels?

cfhammill commented 7 years ago

Does it still fail if you do not write to csv first? By which I mean write out - read back in, because that will likely create weirdness.

gdevenyi commented 7 years ago

This produces a character array:

data = read.csv("/data/chamal/projects/colleen/FAT/MAGeT/cohort1_R.csv")
data$mouseID = as.factor(data$mouseID)
data$sex = as.factor(data$sex)
data$sex = relevel(data$sex, ref = "M")
data$timePoint = as.factor(data$timePoint)
data$timePoint = relevel(data$timePoint, ref = "1")
data$group = as.factor(data$group)
data$group = relevel(data$group, ref = "3tgCD")

## pulls all the volumes for all the labels, which are defined (which number means what) by a csv that lives elsewhere
anat <- anatGetAll(data$label, "/opt/quarantine/resources/DorrSteadmanUllmann/MAGeT/Dorr_2008_Steadman_2013_Ullmann_2013_on_NRXN1a_v1_labels.mnc", method="labels", defs="/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")

This does not

data = read.csv("/data/chamal/projects/colleen/FAT/MAGeT/cohort1_R.csv")
data$mouseID = as.factor(data$mouseID)
data$sex = as.factor(data$sex)
data$sex = relevel(data$sex, ref = "M")
data$timePoint = as.factor(data$timePoint)
data$timePoint = relevel(data$timePoint, ref = "1")
data$group = as.factor(data$group)
data$group = relevel(data$group, ref = "3tgCD")

## pulls all the volumes for all the labels, which are defined (which number means what) by a csv that lives elsewhere
anat <- anatGetAll_old(data$label, "/opt/quarantine/resources/DorrSteadmanUllmann/MAGeT/Dorr_2008_Steadman_2013_Ullmann_2013_on_NRXN1a_v1_labels.mnc", method="labels", defs="/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv")

We're not writing to CSV anywhere.

cfhammill commented 7 years ago

On either anat: any(is.na(anat)) ?

gdevenyi commented 7 years ago

FALSE and FALSE

cfhammill commented 7 years ago

hmmm, I have no idea what's going on, on mock data and all of our tests anatGetAll method = "label" faithfully returns numeric. It's likely the conversion is happening in RMINC:::create_anat_results, but I can't trace it.

defs <-  "/opt/quarantine/resources/DorrSteadmanUllmann/MAGeT/Dorr_2008_Steadman_2013_Ullmann_2013_on_NRXN1a_v1_labels.mnc"

## This should be numeric
cpp_res <- RMINC:::label_counts(data$label, defs = defs, method="labels", defs=defs, strict = TRUE)

label_frame <- RMINC:::create_labels_frame(defs, side = "both")

## Problem should be here...
res <- RMINC:::create_anat_results(cpp_res, label_frame, filenames = data$label, atlas = NULL)

Can you confirm that that is actually where the issue is occuring, if it's happening in the c++ code something even weirder is happening

gdevenyi commented 7 years ago

res indeed contains chr data with spaces instead of numeric.

One fix, I had to do

label_frame <- RMINC:::create_labels_frame("/opt/quarantine/resources/DorrSteadmanUllmann/Dorr_2008_Steadman_2013_Ullmann_2013_mapping_of_labels.csv", side = "both")

Otherwise error:

 label_frame <- RMINC:::create_labels_frame(defs, side = "both")
Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<89>HDF'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 3 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 4 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 5 appears to contain embedded nulls

More info: This is with minc-toolkit/1.9.15

cfhammill commented 7 years ago

There were some typos in the above code - defs should have been the text not the minc file. Also for the new anatGetAll an atlas doesn't need to be passed. I'm still not sure where coercion to character is happening...

Can you send me an rds of cpp_res? Sorry this is taking so long to debug.

gdevenyi commented 7 years ago

anatGetAll_old throws the following warnings, while anatGetAll is silent:

[1] "WARNING: Labels found in the mapping, but not in the file:  199"
[1] "WARNING: Labels found in the mapping, but not in the file:  199"
[1] "WARNING: Labels found in the mapping, but not in the file:  199"
[1] "WARNING: Labels found in the mapping, but not in the file:  199"
[1] "WARNING: Labels found in the mapping, but not in the file:  199"
[1] "WARNING: Labels found in the mapping, but not in the file:  199"

Perhaps this may be an issue.

gdevenyi commented 7 years ago

Also, last subject of anatGetAll has text instead of numbers at all, see data viewer last line image

cfhammill commented 7 years ago

Thanks Gabe, that's definitely the issue, not sure how it got there, but that's at least a step in the right direction

Can you send that rds?

gdevenyi commented 7 years ago

Sent!

cfhammill commented 7 years ago

Sorry Gabe, can you send cpp_res from above?

cfhammill commented 7 years ago

Nevermind, found the problem. I changed create_label_frame to make it useful in anatSummarize but this broke anatGetAll when there's a hierarchy column and it wasn't picked up by our test cases. Fixing now.

cfhammill commented 7 years ago

Should be fixed in develop...?

gdevenyi commented 7 years ago

Sadly no, grabbed/rebuilt HEAD of develop, still broken with characters.

I don't actually see a commit that addresses this? Did you forget to push?