calebclass / NanoTube

Easy NanoString data analysis
GNU General Public License v3.0
1 stars 2 forks source link

reading in rcc files from multiple batches #8

Closed TdzBAS closed 9 months ago

TdzBAS commented 9 months ago

Hi @calebclass,

I have three nanostring datasets, which were conducted with the same protocol. So I have different batches. should I read in each dataset separately or can I read the rcc files alltogether into the function "processnanostringdata" ? Which effect has the subsequent QC on this? Reading in them separately would require to merge them afterwards. IMHO reading everything at once with only one metafile seems to be most convenient.. But dont know if this is the right way..

Best, T

calebclass commented 9 months ago

Hi @TdzBAS ,

I agree that reading all at once is the way to go, and you can do that in processNanostringData by including each folder name in a character vector:

dat <- processNanostringData(nsFiles = c("path/to/folder", "path/to/another/folder"), ...)

As a default, it will just read in all of the RCC files without considering the batches, so if you want to handle batch effects in normalization/QC, you should include a column in your metafile (i.e. Batch = 1, 1, 1, 1, 2, 2, 2, ...)

Good luck! Let me know if you have more questions. -Caleb

TdzBAS commented 9 months ago

Hi @calebclass,

thanks! But I am still curious how/if the batch correction takes place, if I only include a batch column in the metafile? Because I just included It, but still could see immense batch effects in my pca plot. Only after using combat, the batch correction was quiet successful. So how can nanotube deal with it?

Best, T

calebclass commented 9 months ago

Hi @TdzBAS ,

The batch column itself doesn't do anything automatically, but it gives you a few options for how you can handle batch corrections.

  1. If you're using limma, you can do the standard normalization (no batch correction) and then include batch in your design. (see vignette for full example: https://bioconductor.org/packages/release/bioc/vignettes/NanoTube/inst/doc/NanoTube.html
    
    # Design matrix including sample group and batch
    design <- model.matrix(~group + batch)

Analyze data

limmaResults2 <- runLimmaAnalysis(dat, design = design)



2. If your data includes technical replicates across batches, normalization = "RUVIII" might work the best. See the example in the Normalization section of the vignette.

3. normalization = "RUVg" might work the best without explicitly considering your Batch column: if your first principle component identifies the batch effect, you can use the options: n.unwanted = 1, RUVg.drop = 0,  to remove that PC from the data.

Good luck! Let me know if you have more questions.
-Caleb
TdzBAS commented 9 months ago

HI @calebclass,

thanks for this nice response! This issue can be closed. Just one suggestion for enhancement: Would it be possible to extend the input data format to include xls files, in addition to the current support for .txt and .csv? This could potentially streamline certain processes. Thank you!

Best, T

calebclass commented 9 months ago

Hi @TdzBAS ,

Pleasure! I agree with your suggestion.

Cheers, Caleb