Shenhav-and-Korem-labs / SCRuB

Other
25 stars 2 forks source link

Error when incorporating spatial information #15

Closed mhswaney closed 10 months ago

mhswaney commented 12 months ago

Hello - Thank you for developing this awesome tool!

It works great when I run SCRuB without spatial information, however I receive the following error when adding the well IDs:

Error in well_dists[row.names(controls), row.names(samples)] : subscript out of bounds

I can't seem to figure out what might be wrong with my data input of 24 samples (see below for metadata).

Looking into the source code briefly, it seems that the problem might be occurring here:

well_dists <- metadata %>% mutate(well = metadata[, 3] %>% sapply( function(x) which( LETTERS == substr(x, 1, 1) ) ), indices = metadata[, 3] %>% sapply( function(x) substr(x, 2, nchar(x)) %>% as.integer) ) %>% select(as.symbol('well'), as.symbol('indices') ) %>% dist(method=dist_metric) %>% as.matrix()

The Well letters are converted to numeric, but rather than outputting a list of 24 numbers, it seems to return 8 (for A-H). I'm not sure if this is actually the source of the problem, but this is what I could find upon the extent of my ability to troubleshoot :) well must be size 24 or 1, not 8.

If there is something that I may be missing, please let me know. Thank you in advance for your help!

  Negative Type Well
1020 FALSE True Sample A2
1050 FALSE True Sample B2
1120 FALSE True Sample C2
1160 FALSE True Sample D2
1180 FALSE True Sample E2
1200 FALSE True Sample F2
1210 FALSE True Sample G2
1240 FALSE True Sample H2
1250 FALSE True Sample A3
1320 FALSE True Sample B3
1330 FALSE True Sample C3
1370 FALSE True Sample D3
1380 FALSE True Sample E3
1390 FALSE True Sample F3
1420 FALSE True Sample G3
1430 FALSE True Sample H3
7141 FALSE True Sample A1
7151 FALSE True Sample B1
7161 FALSE True Sample C1
7171 FALSE True Sample D1
7181 FALSE True Sample E1
Deidentified FALSE True Sample F1
Pos_Zymo FALSE Mock H1
Neg_10733 TRUE Extraction Negative G1
gaustin15 commented 11 months ago

Hi, thank you for the post

It looks like your metadata file is formatted correctly, and It seems like I am able to run SCRuB on my end using it on some randomly generated samples. Are you able to run the following script on your machine? (this ran on mine)

library(tidyverse)
library(SCRuB)
# metadata string directly copied/pasted from your post
md <- '1020 FALSE   True Sample A2
1050    FALSE   True Sample B2
1120    FALSE   True Sample C2
1160    FALSE   True Sample D2
1180    FALSE   True Sample E2
1200    FALSE   True Sample F2
1210    FALSE   True Sample G2
1240    FALSE   True Sample H2
1250    FALSE   True Sample A3
1320    FALSE   True Sample B3
1330    FALSE   True Sample C3
1370    FALSE   True Sample D3
1380    FALSE   True Sample E3
1390    FALSE   True Sample F3
1420    FALSE   True Sample G3
1430    FALSE   True Sample H3
7141    FALSE   True Sample A1
7151    FALSE   True Sample B1
7161    FALSE   True Sample C1
7171    FALSE   True Sample D1
7181    FALSE   True Sample E1
Deidentified    FALSE   True Sample F1
Pos_Zymo    FALSE   Mock    H1
Neg_10733   TRUE    Extraction Negative G1'

colnames <- c('Negative',   'Type', 'Well')

metadata <- (md %>% str_split('\n') )[[1]] %>% sapply(function(x) str_split(x, '\t') ) %>% unname() %>% data.frame() %>% t()
row.names(metadata) <- metadata[, 1]
metadata <- metadata[,2:4]
colnames(metadata) <- colnames
metadata <- metadata %>% data.frame() %>% mutate(Negative=as.logical(Negative))

samples <- rmultinom(n = nrow(metadata), size=10000, prob= rnorm(n=100, mean=.5, sd=.05)) %>% t()
row.names(samples) <- row.names(metadata)

scr_out <- SCRuB(samples, metadata)

If this does run, then I am guessing that there might be some discrepancies between either the formatting of some of your metadata files and it’s contents; or there might be some trailing whitespaces in the well metadata column (if this is a whitespace issue then we can add a check for that in SCRuB’s backend). In either case, I’m guessing that if you match your metadata object to the one formatted in the code above, then your issue will be resolved.

If you still see the same error when running the above code, would it be possible for you to share some details about your R environment (version, packages, machine specs), so that I can set up an environment to match yours.

mhswaney commented 11 months ago

Thank you so much! Your code works perfectly. You were right - there was an issue with the metadata formatting. My metadata was in tibble format, which seemed to work with SCRuB() when the spatial column was excluded, but when adding this additional column it produced the error. Converting to a data.frame solved the issue.

Thank you for your help! I appreciate it :)