alexyermanos / Platypus

R package for the analysis of single-cell immune repertoires
GNU General Public License v3.0
36 stars 16 forks source link

VGM run with GEX input from Cellranger aggr #18

Closed vickreiner closed 1 year ago

vickreiner commented 2 years ago

Hi there!!

I am trying to create a Platypus object with GEX and VDJ (B cells) together, but only the vdj part run successfully. I have the following files from the count folders of cellranger (as attached in the image). Below is the code and the error messages generated. Really appreciate your help in the matter; any insights would be very welcome!

image
 vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
+                       GEX.out.directory.list = GEX.out.directory.list,
+                       GEX.integrate = T,
+                       VDJ.combine = T,
+                       integrate.GEX.to.VDJ = T,
+                       integrate.VDJ.to.GEX = T, #This will adjunct the VDJ information as metadata to the GEX object
+                       exclude.GEX.not.in.VDJ = F,
+                       filter.overlapping.barcodes.GEX = T,
+                       filter.overlapping.barcodes.VDJ = T,
+                       exclude.on.cell.state.markers = c("CD3E"), #Exclude T cells from this analysis
+                       get.VDJ.stats = T,
+                       parallel.processing = "none", #see note at the end of this chunk
+                       trim.and.align = F, #Do not align BCR sequences to reference 
+                       group.id = c(1,2))
Loading in data 

2022-07-04 16:31:59
Loaded VDJ data 

2022-07-04 16:32:02
Setting GEX directory to provided path/sample_feature_bc_matrix 
10X data contains more than one type and is being returned as a list containing matrices of each type.
10X data contains more than one type and is being returned as a list containing matrices of each type.
GEX input 1 contains multiple count matrices. 
GEX input 1 element 1 contains > 100 features and will be loaded as GEX 
GEX input 1 element 2 contains > 100 features and will be loaded as GEX 
GEX input 2 contains multiple count matrices. 
GEX input 2 element 1 contains > 100 features and will be loaded as GEX 
GEX input 2 element 2 contains > 100 features and will be loaded as GEX 
Loading GEX failed 

attempt to set 'rownames' on an object with no dimensionsGetting VDJ GEX stats 

Starting with 1 of 2...

Getting lookup tables... 

Starting with 2 of 2...

Getting lookup tables... 

Getting 10x stats 

Adding 10x metrix failed 

VDJ stats failed: 

invalid argument typeFor sample 1: 2954 cells assigned with high confidence barcodes in VDJ 

For sample 2: 4300 cells assigned with high confidence barcodes in VDJ 

Removed a total of 13 cells with non unique barcodes in VDJ 

Starting VDJ barcode iteration 1 of 2...

[1] "2022-07-04 16:32:58 EDT"
Done with 1 of 2 

2022-07-04 16:33:45
Starting VDJ barcode iteration 2 of 2...

[1] "2022-07-04 16:33:45 EDT"
Done with 2 of 2 

2022-07-04 16:35:12
Done with GEX pipeline 

[1] "2022-07-04 16:35:12 EDT"
Adding VDJ stats...

Adding runtime params...

Done!

Originally posted by @Andy-ChanKP in https://github.com/alexyermanos/Platypus/issues/13#issuecomment-1174317917

vickreiner commented 2 years ago

Hi!

From what it looks like you are using the VGM function from our last CRAN release. We did notice issues with processing Cellranger aggr output, where both GEX and Feature Barcode data is stored in a single outs directory. We fixed these, but this did not make it into a CRAN build yet. Could you try with the newest function version from Git: VDJ_GEX_matrix.R

Thanks!

Andy-ChanKP commented 2 years ago

Hi Victor!

Thank you for your help!! I managed to use the new script via source_url from devtools, and ran the updated VDJ_GEX_matrix function with the following codes. However, the GEX failed to load. Also it is said that "adding Feature barcode information to VDJ failed".

VGM <- VDJ_GEX_matrix( VDJ.out.directory.list = VDJ.out.directory.list, GEX.out.directory.list = GEX.out.directory.list, FB.out.directory.list = FB.out.directory.list, FB.ratio.threshold = 2)

image image image

We generated these outputs using cellranger v6, and attached are the files within the folders (count, multi_count, vdj_b) in creating the list for GEX, FB and VDJ respectively

image image image

I also attached this image showing the objects in the list, and you can see GEX failed to load, while both VDJ and VDJ.GEX.stats seems to load successfully (but I have doubts if FB info is attached to VDJ)

image

Let me know how I can work around this and create a Platypus object! Thank you for your help in this matter!

Best, Andy

vickreiner commented 2 years ago

Hi!

The callback from this function gives me a hint: It seems like the GEX output directories to which you are pointing each contain two gene expression matrices. Is that correct, or did you process one GEX and one Feature barcode/Cite-seq library using Cellranger multi. In the latter case, how many feature barcodes did you use in this experiment?

Currently we do not have the compatibility for more than one GEX matrix per Cellranger output directory. You can however process these GEX data via Seurat and then provide this object as Seurat.in to the VDJ_GEX_matrix along with you VDJ cellranger output paths

Thanks!

alexyermanos commented 2 years ago

Another possible thing to try could be to make separate input directories for the feature barcodes and the GEX data. it may help to just put in the 3 files corresponding to the matrix/barcodes/features into this directory for each data type. then it can be specified using the GEX.out.directory.list and FB.out.directory.list arguments

Andy-ChanKP commented 2 years ago

Hi Victor and Alex,

Thank you for your comments! Yes, we do have two GEX matrix per output at the moment; we will try to remove one of the GEX files and run this again. Alternatively, as you mentioned, we will try to import our Seurat object to the VDJ_GEX_matrix. Will update you guys if I successfully load the entire transcriptomic and repertoire data either way. Thank you so much for your help!

Best, Andy

@mbartl13 our bioinformatics post-doc in the lab would like to follow this thread!

vickreiner commented 2 years ago

Sounds good! We are here if you need any help!

Andy-ChanKP commented 1 year ago

Hi Victor and Alex,

Sorry it took me a while before looking into this again.

So I have tried Alex's suggestion of pointing to just the three files within the FB or GEX folders but without much success. As of now, I did try to use our own Seurat object and now both the GEX and VDJ could be loaded into vgm.

However, there are some errors - For input Seurat object GEX and VDJ barcode overlap is: 0 (so VDJ.GEX.stats failed)

Also warning message: In FetchData.Seurat(GEX.proc, vars = c("orig.ident", "orig_barcode", : The following requested variables were not found: tSNE_1, tSNE_2

The code that I used: vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list, Seurat.in = immune.combined, group.id = c(1,1,1,1,2,2,2,2,2))

image image

Many thanks, Andy

vickreiner commented 1 year ago

Hi Andy,

first concerning the warning: the VGM function tries to append UMAP and TSNE info to the VDJ table. It appears that your input Seurat does not contain a TSNE reduction, so the VGM returns a warning. This should not cause any trouble. The metrics.csv table may not be in the VDJ directory. I will add an error catcher so that VDJ.GEX.stats still returns an output.

Concerning the 0 overlap between GEX and VDJ: The VGM function integrates on a per-sample level, and needs a correctly formatted sample_id column in the Seurat input. Could you send me: table(immune.combined$sample_id)

Further you can check if there is any overlap between barcodes via the orig_barcode columns in VDJ and GEX via: length(intersect(immune.combined[[1]]$orig_barcode, immune.combined[[2]]$orig_barcode))

Andy-ChanKP commented 1 year ago

Hi Victor,

Really appreciated your prompt reply!! Yes, we only have the UMAP but no TSNE in this Seurat input.

Regarding the metric.csv table, I know it is not in the VDJ directory that I pointed out to (it is further up the directory) - would you recommend me moving that file to the VDJ directory along with all the other files?

With the sample_id and orig_barcode, I did follow the naming of the sample_id to s1, s2..... also I modified your code a bit to point to orig_barcode within the meta.data of our Seurat object (immune.combine). I don't think the orig_barcode exists in our object, but we do have other group columns that I am showing here... I can easily add a column called orig_barcode or modify that within the metadata too, if that is recommended. Hope all these might help to pin point the issue!

I am glad at the very least both the VDJ and GEX information are loaded! One issue at a time.

image image

Many thanks, Andy

vickreiner commented 1 year ago

Hi Andy,

yes, moving the metrics file to the VDJ source directory should solve this issue.

The formatting of your sample_id column in the input immune.combine looks fine.

Sorry the call for orig_barcode should have been referred to the vgm output. That was my mistake: length(intersect(vgm[[1]]$orig_barcode, vgm[[2]]$orig_barcode))

Thanks for checking this too

Andy-ChanKP commented 1 year ago

Hi Victor,

Happy to say when I re-run it this time, there are overlaps!

However, the VDJ_GEX stats still failed... if this is primarily due to the metric table not found in the directory, I will do that tomorrow, because I would need our post-doc (@mbartl13) to chmod her cellranger output folders for me to move these files.

I think the number looks good to me - what do you think?

Many thanks, Andy

image image image
mbartl13 commented 1 year ago

Done- lmk if other files are needed.

vickreiner commented 1 year ago

Hi, that looks as is should! What did you change from the last function call?

I just pushed a small change to the function which should fix the VDJ.GEX.stats error. Thanks for pulling it and trying once more: alexyermanos/Platypus/blob/master/R/VDJ_GEX_matrix.R

Andy-ChanKP commented 1 year ago

Hi,

I do not know what might have been different, but clearly now we see barcode overlaps!

However, VDJ stat still failed, even after moving the metric_summary.csv into each of the vdj_b directory and using the updated VDJ_GEX_matrix.R. The error is different though.

Thank you for your help and patience in making this work for us, Really excited to run the rest of the package!!!

image

image
vickreiner commented 1 year ago

Hi Andy,

Thanks for again sharing the error. I will push another fix as soon as I get to it and let you know.

vickreiner commented 1 year ago

Hi again, I was only able to replicate this error when reformatting or deleting all content of one of the metrics.csv in the VDJ input directory. Possibly this error is caused by an issue with one of the csvs. I wonder: what version of cellranger are you using?

A new version of the function is pushed, which fixes an issue with the error catchers. This way you should now still get a VDJ.GEX.stats output despite missing one of the csvs.

Hope this works!

Andy-ChanKP commented 1 year ago

Hi Victor,

Thank you for pushing the new version of the function!!! I will test that out tomorrow - our cluster seems to be undergoing some maintenance (or it simply doesn't work) for the time being.

We are using cellranger-6.1.1, and the metric_summary.csv file is not inside the count folder (one level outside actually). Furthermore, this file contains not just gene expression, but also antibody capture and VDJ B in the Library Type column. Would you recommend deleting the rows on Antibody Capture in this file to work with the pipeline?

I will first test out the new function tomorrow! Really appreciated your help.

Many thanks, Andy

alexyermanos commented 1 year ago

Would you mind posting the command you use for the cellranger alignment? I am assuming it supplies both GEX and VDJ into a single call. So far we have not really tested this out much, as we prefer to run the two libraries separately in our pipelines given some experiments only have one or the other. But we can add this to a future VGM function call :)

Andy-ChanKP commented 1 year ago

Hi Victor and Alex,

Thank you for your help and advice along the way, and I am happy to say that the VDJ_GEX_stat works (I actually removed the entire metric_summary.csv file altogether), and I have a complete vgm object! Exciting times...

Regarding the cellranger, yes, we do have information of both GEX and VDJ at a single cell level. As for the command, I will follow up with our post-doc later (she is on vacation right now and was the one working with cellranger on these samples). I am sure she would be able to give us an answer soon-ish!

Really appreciated your prompt reply to all the issues I have had and pushed fixes along the way - it really does make a huge difference when using these pipelines!!! :)

Many thanks, Andy

mbartl13 commented 1 year ago

Thanks for your patience, was on vacation and starting a new job. Here's the code for cellranger: cellranger multi --id=20220523_43F --csv=Multi_NHP_43F.csv
The multi file includes paths to the GEX, CSP, and VDJ-T/VDJ-B data. What this pipeline requires is individually run GEX/VDJ? If so I can point Andy to those paths to try.

Thanks! Maggie