alexyermanos / Platypus

R package for the analysis of single-cell immune repertoires
GNU General Public License v3.0
36 stars 16 forks source link

Data upload trouble #48

Open martuskaR opened 1 year ago

martuskaR commented 1 year ago

Hello,

I have scRNA + AbSeq + TCR sequencing data from BD rhapsody. The output from SevenBridges is different to the one from CellRanger. Platypus looks very promising and would like to follow your vignettes to analyse my TCR data. Unfortunately, I am facing an issue with loading the data itself. I tried to manipulate my data so that it has cellRanger format, however I repeatedly fail to create a VGM, and to be very honest have no idea what to do to be able to get started.

This is the error:

image

I have a Seurat Object, however, I also don't have any information on "cell_id" and "group_id".

I'm sorry if this is very basic, I am new to R and so sc analysis. Any hint will be much appreciated.

Thanks a lot,

alexyermanos commented 1 year ago

Hello, I would making the elements of the VGM separately - so try to rename the columns of the repertoire side to fill a dummy vgm[[1]]. Then I would suggest to use your Seurat pipeline to create a VGM[[2]]. Then you can use the integrate function [VDJ_GEX_integrate.R] to link the two together once you match the barcode columns.

The cell_id should just be the cell barcode and the group_id is a custom vector indicating which group the sample is from. e.g. if u have case vs control , 2 samples each, group id <- (1,1,2,2). then each cell from case(group1) would have a group_id=1. one thing I would check for is that the barcodes in the TCR and the scRNA seq are in the same format. probably good idea to use head(scRNA$barcode) and head(TCR$barcode) once you build the objects to check that the integrate function will actually detect the same ones.

martuskaR commented 1 year ago

Hello, Thank you very much for your reply. Where can I find all column names required for Platypus? I don't have group_id, neither cell_id in my data, so it would be useful to have a template of what the data should look like and I can then try to change my data as required. Would this be a good start?

Thank you very much,

alexyermanos commented 1 year ago

if you run the following code from the quickstart (https://alexyermanos.github.io/Platypus/articles/quickstart.html), you should be able to download a VGM object directly to have a reference for the structure.

Downloading PlatypusDB raw data in a list format

For structure of PlatypusDB links, please refer to the PlatypusDB vignette

yermanos2021_raw <- PlatypusDB_fetch(PlatypusDB.links = c("yermanos2021b/ALL/ALL"), load.to.enviroment = F, load.to.list = T)

otherwise the column names are here for VGM[[1]]. VGM[[2]] is just a standard Seurat object

“barcode” “sample_id” “group_id” “clonotype_id_10x” “clonotype_id” “clonotype_frequency” “celltype” “Nr_of_VDJ_chains” “Nr_of_VJ_chains” “VDJ_cdr3s_aa” “VJ_cdr3s_aa” “VDJ_cdr3s_nt” “VJ_cdr3s_nt” “VDJ_umis” “VJ_umis” “VDJ_chain_contig” “VJ_chain_contig” “VDJ_chain” “VJ_chain” “VDJ_vgene” “VJ_vgene” “VDJ_dgene” “VDJ_jgene” “VJ_jgene” “VDJ_cgene” “VJ_cgene” “VDJ_sequence_nt_raw” “VDJ_sequence_aa” “VJ_sequence_aa” “VDJ_raw_ref” “VJ_raw_ref” “VDJ_trimmed_ref” “VJ_trimmed_ref” “VDJ_raw_consensus_id” “VJ_raw_consensus_id” “orig_barcode” “specifity” “affinity” “GEX_available” “orig.ident” “seurat_clusters” “PC_1" “PC_2” “UMAP_1" “UMAP_2” “tSNE_1" “tSNE_2” “batches”