alexyermanos / Platypus

R package for the analysis of single-cell immune repertoires
GNU General Public License v3.0
40 stars 16 forks source link

bulk analysis #12

Closed leeanapeters closed 2 years ago

leeanapeters commented 2 years ago

Hi there, thanks for creating functionality for bulk repertoires, this is great!

I was attempting to use the bulk vgm function and was encountering an error that I was hoping you could help with:

When using vgm_bulk <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = ilist, input.type = "MAF", # if working with MAF outputs, the input.type group.id = c("ND","T1D","T1D","ND","ND","T1D","ND","T1D","T1D","T1D","ND","T1D"), integrate.MIXCR.output=FALSE, vgm.expanded = TRUE)

with airr output from TRUST4 (let me know if this is not a supported output to use with the MAF argument) I get this error:

Error in data.frame(barcode, sample_id, group_id, clonotype_id_10x, celltype, : arguments imply differing number of rows: 0, 2196, 1 Called from: data.frame(barcode, sample_id, group_id, clonotype_id_10x, celltype, Nr_of_VDJ_chains, Nr_of_VJ_chains, VDJ_cdr3s_aa, VJ_cdr3s_aa, VDJ_cdr3s_nt, VJ_cdr3s_nt, VDJ_chain_contig, VJ_chain_contig, VDJ_chain, VJ_chain, VDJ_vgene, VJ_vgene, VDJ_dgene, VDJ_jgene, VJ_jgene, VDJ_cgene, VJ_cgene, VDJ_sequence_nt_raw, VJ_sequence_nt_raw, VDJ_sequence_nt_trimmed, VJ_sequence_nt_trimmed, VDJ_sequence_aa, VJ_sequence_aa, VDJ_trimmed_ref, VJ_trimmed_ref, VDJ_raw_consensus_id, VJ_raw_consensus_id, orig_barcode, clonotype_frequency, specifity, affinity, GEX_available, orig.ident, orig_barcode_GEX, seurat_clusters, PC_1, PC_2, UMAP_1, UMAP_2, tSNE_1, tSNE_2, batches, clonotype_id, VDJ_nSeqFR1, VDJ_nSeqCDR1, VDJ_nSeqFR2, VDJ_nSeqCDR2, VDJ_nSeqFR3, VJ_nSeqCDR3, VDJ_nSeqFR4, VDJ_aaSeqFR1, VDJ_aaSeqCDR1, VDJ_aaSeqFR2, VDJ_aaSeqCDR2, VJ_aaSeqFR3, VDJ_aaSeqCDR3, VDJ_aaSeqFR4, VDJ_SHM, VJ_nSeqFR1, VJ_nSeqCDR1, VJ_nSeqFR2, VJ_nSeqCDR2, VJ_nSeqFR3, VJ_nSeqCDR3, VJ_nSeqFR4, VJ_aaSeqFR1, VJ_aaSeqCDR1, VJ_aaSeqFR2, VJ_aaSeqCDR2, VJ_aaSeqFR3, VJ_aaSeqCDR3, VJ_aaSeqFR4,

Any help would be appreciated!

I am using version 3.3 and the files are located locally, but I copied the vgm function from source.

Thanks

Leeana

alexyermanos commented 2 years ago

Hi :) so the MAF argument is actually for an internally used bulk sequencing pipeline that was developed in the Reddy lab https://pubmed.ncbi.nlm.nih.gov/26998518/ and is not really for trust4 output.

But if you can send us one or two toy example output files/objects you are suppling as input, we can adapt the bulk function to also read this as input into the vgm format

leeanapeters commented 2 years ago

Thank you for the quick response!

I have a list of dataframes here that are TRUST4 output in airr format. Does this work?

Thanks again for always being so responsive to these comments!

Leeana

alexyermanos commented 2 years ago

Sorry about the delay - we will try to add this to the bulk function as soon as possible

leeanapeters commented 2 years ago

Thanks! Could you also clarify the mixcr input? I have tried and have gotten no gene information to populate in the vgm object. See below

I have quite a few files so I just made a list of the files in my directory and used that as input ie files <- list.files(path) VDJ.bulk.out.directory.list <- list() VDJ.bulk.out.directory.list[[1]] <- files[1] VDJ.bulk.out.directory.list[[2]] <- files[2] VDJ.bulk.out.directory.list[[3]] <- files[3] VDJ.bulk.out.directory.list[[4]] <- files[4] VDJ.bulk.out.directory.list[[5]] <- files[5] etc

vgm_bulk <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = VDJ.bulk.out.directory.list, input.type = 'MIXCR', # if working with MAF outputs, the input.type

parameter should be set to "MAF"

                                   integrate.MIXCR.output = TRUE,
                        group.id=c(2,1,2,2,2,2,2,2,2,2,2,2,1,2,2,2,2,2,1,1,1,2,1,2,1,1,1,1,2,2,1,1,1,2,2,2,2,2,2,1,1,1,2,2,1,1,1,1,1,2,1,1,1,1),
                                  cell.type = "Tcells",
                                   vgm.expanded = TRUE)

Any idea of what I could be doing wrong?

Thanks!

[cid:1154e830-5623-4483-a471-8a0967672943]


From: Alex Yermanos @.> Sent: Tuesday, May 24, 2022 7:32 AM To: alexyermanos/Platypus @.> Cc: Peters,Leeana D @.>; Author @.> Subject: Re: [alexyermanos/Platypus] bulk analysis (Issue #12)

[External Email]

Sorry about the delay - we will try to add this to the bulk function as soon as possible

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alexyermanos_Platypus_issues_12-23issuecomment-2D1135796934&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=zrOHRUbMoMSIoz7l5zvpSifsOzkwEbhjSJrU7vQObK8&m=bPllPvI7tq1eEY4uRV-JSkRoFSG6NkGTdEvXTLJYlvycv1ZqWJ6YsOeu8r7CS9Uf&s=uQkIibJ_D9qG2x1WAKyFVTnoJ-5PalLPVDPfgeT-u8E&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APDFEKBLEWQETLBMS3B4RJDVLS43PANCNFSM5U7Q22CQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=zrOHRUbMoMSIoz7l5zvpSifsOzkwEbhjSJrU7vQObK8&m=bPllPvI7tq1eEY4uRV-JSkRoFSG6NkGTdEvXTLJYlvycv1ZqWJ6YsOeu8r7CS9Uf&s=Cx5lNy5q9h9bQ_9f6LmjCOugamI3uJr97zVVO8535fo&e=. You are receiving this because you authored the thread.Message ID: @.***>

loreguerci commented 2 years ago

Hi!

VDJ_bulk_to_vgm() now supports TRUST4 files. You can use "TRUST4" or "TRUST4.FULL" as "input.strategy" "TRUST4.FULL" contains additional TRUST4 columns, which were not originally supported by VGM: "cdr1", "cdr2", "v_cigar", "d_cigar", "j_cigar", "v_identity", "j_identity", "complete_vdj". In TRUST4, "sequence_id" is not unique for each read. We use it as clonotype when no clonotyping strategy is specified as argument. TRUST4 (and TRUST4.FULL) require an RDS file as input.

Example: islet_platypus <- readRDS("C:/Users/loren/Desktop/data/islet_platypus/islet_platypus.RDS") vgm_bulk_t4 <- bulk_to_vgm_t4(VDJ.bulk.out.directory.list = islet_platypus, input.type = "TRUST4.FULL")

loreguerci commented 2 years ago

Thanks! Could you also clarify the mixcr input? I have tried and have gotten no gene information to populate in the vgm object. See below I have quite a few files so I just made a list of the files in my directory and used that as input ie files <- list.files(path) VDJ.bulk.out.directory.list <- list() VDJ.bulk.out.directory.list[[1]] <- files[1] VDJ.bulk.out.directory.list[[2]] <- files[2] VDJ.bulk.out.directory.list[[3]] <- files[3] VDJ.bulk.out.directory.list[[4]] <- files[4] VDJ.bulk.out.directory.list[[5]] <- files[5] etc vgm_bulk <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = VDJ.bulk.out.directory.list, input.type = 'MIXCR', # if working with MAF outputs, the input.type # parameter should be set to "MAF" integrate.MIXCR.output = TRUE, group.id=c(2,1,2,2,2,2,2,2,2,2,2,2,1,2,2,2,2,2,1,1,1,2,1,2,1,1,1,1,2,2,1,1,1,2,2,2,2,2,2,1,1,1,2,2,1,1,1,1,1,2,1,1,1,1), cell.type = "Tcells", vgm.expanded = TRUE) Any idea of what I could be doing wrong? Thanks! [cid:1154e830-5623-4483-a471-8a0967672943]

For MIXCR you will need to create a list containing the local paths to bulk sequencing output .txt files processed with MiXCR:

VDJ.bulk.out.directory.list <- list() VDJ.bulk.out.directory.list[[1]] <- c("/Downloads/PC_14VDJ.txt") VDJ.bulk.out.directory.list[[2]] <- c("/Downloads/PC_15VDJ.txt")

vgm_bulk <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = VDJ.bulk.out.directory.list, input.type = 'MIXCR', integrate.MIXCR.output = TRUE, group.id = c(1,2), cell.type = "Bcells", vgm.expanded = TRUE)

Please also take a look at our vignette: https://alexyermanos.github.io/Platypus/articles/Platypus_Bulk_Vignette.html

leeanapeters commented 2 years ago

Hi, thank you for the reply!

What format should the RDS be in? I saved a list of dataframes of trust4 output for each subject as an RDS and this doesnt seem to work for me in the way that you showed with saving and reloading the RDS.

vgm_bulk_t4 <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = islet_platypus, input.type = "TRUST4") No clone.strategy provided. Sequence_id values will be used as clonotypes ERROR, TRUST4 only supports .RDS as input.type! Error in *tmp*[[i]] : subscript out of bounds

This is with the function from source.

Thanks!

Leeana


From: loreguerci @.> Sent: Friday, July 8, 2022 8:12 AM To: alexyermanos/Platypus @.> Cc: Peters,Leeana D @.>; Author @.> Subject: Re: [alexyermanos/Platypus] bulk analysis (Issue #12)

[External Email]

Hi!

VDJ_bulk_to_vgm() now supports TRUST4 files. You can use "TRUST4" or "TRUST4.FULL" as "input.strategy" "TRUST4.FULL" contains additional TRUST4 columns, which were not originally supported by VGM: "cdr1", "cdr2", "v_cigar", "d_cigar", "j_cigar", "v_identity", "j_identity", "complete_vdj". In TRUST4, "sequence_id" is not unique for each read. We use it as clonotype when no clonotyping strategy is specified as argument. TRUST4 (and TRUST4.FULL) require an RDS file as input.

Example: islet_platypus <- readRDS("C:/Users/loren/Desktop/data/islet_platypus/islet_platypus.RDS") vgm_bulk_t4 <- bulk_to_vgm_t4(VDJ.bulk.out.directory.list = islet_platypus, input.type = "TRUST4.FULL")

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alexyermanos_Platypus_issues_12-23issuecomment-2D1178917959&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=zrOHRUbMoMSIoz7l5zvpSifsOzkwEbhjSJrU7vQObK8&m=zifyXSkA2PgCuleAA_mxAXe7Zb6wtXTTvd55Mf7hkjDHD3HME30-K6X6Cc-Le-H0&s=o7ZY7S4jO4e8HisS6ZIAoEl2dxG-002QOceK3f0rHyg&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APDFEKF3NS7MURBX3G7PRXLVTALLDANCNFSM5U7Q22CQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=zrOHRUbMoMSIoz7l5zvpSifsOzkwEbhjSJrU7vQObK8&m=zifyXSkA2PgCuleAA_mxAXe7Zb6wtXTTvd55Mf7hkjDHD3HME30-K6X6Cc-Le-H0&s=y5L7M748GlrCq8-rQt8qqSuRB55XKbZlYaULm-tnFf0&e=. You are receiving this because you authored the thread.Message ID: @.***>

loreguerci commented 2 years ago

Hello,

The new version is not on CRAN yet, please find the latest version here: https://github.com/alexyermanos/Platypus/blob/master/R/VDJ_bulk_to_vgm.R I tested it with your islet_platypus and it should work.

We always have some delay before the functions are on CRAN, so when there is a bug it is always good to first check if it was fixed with a new version on GitHub.

Let us know if it works, Lorenzo

leeanapeters commented 2 years ago

Sorry, to clarify I did use the function above and I am still getting the same error. I do have an older version of platypus, do I need to update?

leeanapeters commented 2 years ago

Nevermind I just needed to make sure dplyr was loaded as the grouping by sample option wasnt working!

leeanapeters commented 2 years ago

Sorry, I was able to load the object but am having some issues. It looks like the SHM calculation is not working as all values are NA, and even though there is expansion the donut plots also show NA. Do I need to split these by T and B cell, could that be part of the problem?

Thanks

loreguerci commented 2 years ago

Hello :)

I might need some more details. Could you share your code (from reading the data to the call of SHM and donut plot function including intermediate processing if there is any)? So I can try reproduce the problem

Thank you

leeanapeters commented 2 years ago

Sure thank you for the response!

The input data was a list of dataframes made by reading in the Trust4 report_vdjformat.tsv file, example: read.table("6414_Trust_report_vdjformat.tsv", sep = "\t", header = TRUE)

These were made into a list and saved as an RDS file.

Then I did vgm_bulk_t4 <- VDJ_bulk_to_vgm(VDJ.bulk.out.directory.list = "path.to.saved.rds." , input.type = "TRUST4.FULL") *I tried the regular TRUST4 option as well

The resulting object does have the cdr3 info and gene usage info but SHM says NA for all.

I just tried SHM_plt <- VDJ_plot_SHM(VDJ = vgm_bulk_t4[[1]], group.by = "sample_id", quantile.label = 0.95)

and I get summarise() has grouped output by 'name'. You can override using the .groups argument. Error in (1 - h) * qs[i] : non-numeric argument to binary operator In addition: There were 48 warnings (use warnings() to see them)

How can I specify groups for this function? I noticed that my object has NA for groups so should that be part of the initial list setup?

It also says it is using counts provided by 10X in the clonotype_frequency column but that is NA... do I just need to reclonotype?

Happy to provide more info or send the object to you.

Thank you for your help!

Leeana

alexyermanos commented 2 years ago

Hey Leeana sorry again for the delay - we are using this pipeline a lot less than others in the package.

Could you maybe send me the report_vdjformat.tsv file (or a subsampled version of this) at ayermanos@gmail.com? I can take a look about the issues you are seeing cheers alex