Coolgenome / TCM

Codes used in pan-T cell analysis
45 stars 17 forks source link

Availability of reference Seurat objects #1

Open SirKuikka opened 1 year ago

SirKuikka commented 1 year ago

Hi,

Is the reference Seurat object available somewhere? I would like to run TCellMap.R.

refSeuratObj <- readRDS(opt$referenceData)

dustincys commented 1 year ago

Hi @SirKuikka

I apologize for any inconvenience this may cause you.

I am writing to inform you of our current efforts to upload the processed data object to the Gene Expression Omnibus (GEO) database. Our objective is to provide easy access to the data and enable wider dissemination of our findings.

Further to this, we are developing a new R package "MultipleMap" which offers the ability to map to a reference while accounting for batch effects. This package will be made available to the scientific community and will be an important resource for researchers working with gene expression data.

Thank you for your understanding and support.

Merguerrero commented 1 year ago

Glad to hear (read) that!

Thank you for your efforts. Looking forward to test the R package and use your reference.

ccruizm commented 1 year ago

Hello @dustincys

Do you have an estimated date when the Seurat object (or count matrices+cell annotation) will be available in GEO? Also, patiently waiting to use your nice dataset for reference mapping

Thanks in advance!

nuneznicolas commented 1 year ago

Hi @dustincys,

First, I would like to express my sincere congratulations on your outstanding paper. Bravo!!

I am particularly interested in the count matrix data mentioned in your paper, as they hold great potential for further analysis and exploration. I wanted to kindly inquire about the availability of the data, specifically the .RDS files (CD8, CD4, Innate_ etc and seurat objet) associated with your research. I understand that you are currently in the process of uploading the data to the Gene Expression Omnibus (GEO) database. I was wondering if you have an estimated timeline for when the data will be accessible to the scientific community.

Thank you for your understanding and for your efforts in sharing your research.

Warm regards,

Nicolás

dustincys commented 1 year ago

Hi @nuneznicolas @ccruizm ,

I hope this email finds you well. I am writing to express my appreciation for your support and to provide an update on my current progress.

I am currently attempting to upload the processed data to the GEO repository, specifically GSE222859. However, this task has proved challenging due to the inclusion of samples from other publicly available GEO repos. I am doing my best to resolve this issue, but if I cannot successfully upload it to the public data repository, I will explore alternative options such as uploading to GitHub or another suitable platform.

Best regards,

ccruizm commented 1 year ago

Thanks @dustincys for the update! Some good alternatives are Zenodo or CellXGene (for the last one you'll need to do more formatting so I would suggest the first one better). 😉

nuneznicolas commented 1 year ago

Thanks @dustincys! I agree with @ccruizm.

Best and thanks a lot!

Nicolás

ccruizm commented 1 year ago

Hey @dustincys,

I hope you're doing well. I am following up on the data availability. I understand that you're working hard to make it accessible to the scientific community, and I appreciate your efforts in that regard.

I was wondering if you have any updates on when the data will be available for us to access.

Thank you once again. I'm eagerly awaiting the availability of the data.

dustincys commented 1 year ago

Hello All,

I wanted to provide an update regarding the data sharing process. After careful consideration, we have decided to provide a download link through our website. This decision was made to ensure that we comply with the MD Anderson Cancer Center's data security policy. We encountered some difficulties for uploading to GEO, in collecting all sample details, as the data also contains other GEO repositories.

At this time, the data is still undergoing a data security check. As soon as this process is complete, we will provide the download link through our website. Please be aware that due to regulations set out in the MDACC's data security code, I am unable to send the data personally in private.

Thank you for your understanding and patience as we work to ensure the protection of the data.

Best,

dustincys commented 1 year ago

Hello everyone,

Currently, the data is undergoing a thorough security check to ensure its safety. Once the data has been fully vetted and deemed secure, the download link will be made available at the bottom of the overview page (as shown in the image). We understand that this delay may be frustrating, but please know that we are taking every precaution necessary. Thank you for your understanding and patience. If you have any further questions or concerns, please do not hesitate to reach out.

Best regards,

image

nuneznicolas commented 1 year ago

Thanks a lot @dustincys !!

Best

Nicolás

ccruizm commented 1 year ago

Hello @dustincys,

Do you have any update on when the dataset will be released? I check daily and still do not see it on the website you mentioned. 😅

Thanks in advance!

nuneznicolas commented 1 year ago

Dear @dustincys, Do you have any news about the data? Thanks in advance

Nicolás

dustincys commented 1 year ago

Hello @ccruizm @nuneznicolas

The data will be online very soon.

Over the past few weeks, we have held a number of meetings to address any concerns and ensure that the data we are working with does not contain any patient information. I am pleased to inform you that these meetings have been successful in clarifying this aspect. Currently, we are in the final stages of the data security check, and it is nearing completion. Once all the necessary measures are in place, we will be ready to make the data available online.

Kind regards,

ZhihaoAlex commented 1 year ago

Dear @dustincys

Do you have any news about the data? We can't wait to experience your newly developed tools.

Thanks in advance

dustincys commented 1 year ago

@ZhihaoAlex

Hi Alex,

I apologize for any confusion, but I have received an update from Rsch Info Sys of MDACC regarding the availability of the data. It seems that there is a freeze on any changes for the rest of the month due to end of year IS schedules. While the change request will be reviewed in August, the formal implementation of the requested changes will not be possible until 9/7/23 at the earliest.

Sincerely,

image

dustincys commented 1 year ago

Hi All,

I wanted to inform you that the SCRP update has been approved after we submitted an emergency ticket. I am pleased to let you know that the data is now available online.

Best

Chris-Cherry commented 11 months ago

Hi @dustincys,

Thank you so much for uploading the processed CD4 and CD8 data. Is there any chance it's possible to also get the GD data? It would be wonderful!

Cheers and thanks again,

Chris

dustincys commented 11 months ago

Hi Chris,

Thank you for reaching out to us. We appreciate your interest in our study and are happy to provide the data you have requested. However, I wanted to inform you that releasing certain data from our study is subject to certain restrictions.

Specifically, we have several unpublished datasets from our collaborators, and we have been asked to release this information (including barcode and patient ID) only after their manuscripts have been accepted. This is to ensure proper attribution and adherence to academic norms.

Regarding the public data set of GD cells, we are happy to share the expression matrix. In the meanwhile, I have to inform you that I am unable to personally send the data to you in a private manner. Our institution, MD Anderson Cancer Center, strictly regulates data security and privacy. Any violation of these regulations could result in serious consequences, including job suspension for both myself and my supervisor.

In addition, please note that the process of conducting a data security check with regard to the patient ID or barcode may take considerable time. For more detailed information on this, you can visit the following link: https://github.com/Coolgenome/TCM/issues/1#issuecomment-1649960522

Thank you for your understanding and patience.

Best

Chris-Cherry commented 8 months ago

Hey @dustincys - thank you so much for the response and my apologies for the delayed response. We totally understand the requirements and would appreciate any GD data that could be shared through the appropriate channels. If there's anything I can do to assist or expedite, please let me know!

LQLe2 commented 8 months ago

hi, How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article? thanks!

SirKuikka commented 7 months ago

hi, How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article? thanks!

This is actually something I was wondering as well.

dustincys commented 6 months ago

It is the DEG list. https://static-content.springer.com/esm/art%3A10.1038%2Fs41591-023-02371-y/MediaObjects/41591_2023_2371_MOESM3_ESM.xlsx

dustincys commented 6 months ago

hi, How can I get the file snn-single-markers.tsv, is it possible to provide the file or can I replace it with the DEGtop50 from the article? thanks!

Yes

SirKuikka commented 6 months ago

Hi @dustincys

And what about these two scripts?

TCellMap.R TCellMap2.R

Which one should we use?

In TCellMap.R there are these two files that I can't find:

cellCycleGeneT1 <- read_tsv("/rsrch3/scratch/genomic_med/ychu2/projects/p1review/R3Q7/knowledge/public/database/general/cell-cy\ cle-gene-list.txt") cellCycleGeneT2 <- read_tsv("/rsrch3/scratch/genomic_med/ychu2/projects/p1review/R3Q7/knowledge/public/database/general/regev_l\ ab_cell_cycle_genes.txt")

dustincys commented 6 months ago

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In particular, you may find this package useful for addressing some of the challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar. TCellMap2.R, however, utilizes each batch to map the query, but some of the mapping results may not be optimal. In such cases, the MultiMap package could potentially provide better batch mapping results.

Best regards,

SirKuikka commented 6 months ago

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In particular, you may find this package useful for addressing some of the challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar. TCellMap2.R, however, utilizes each batch to map the query, but some of the mapping results may not be optimal. In such cases, the MultiMap package could potentially provide better batch mapping results.

Best regards,

Yes, sorry. I don't know why I forgot Multimap. Thanks!

SirKuikka commented 6 months ago

Hi SirKuikka,

I would like to recommend considering the use of the MultiMap package, which can be found at https://github.com/WangLab-ComputationalBiology/MultiMap. In particular, you may find this package useful for addressing some of the challenges you are facing.

For a practical example of how to use MultiMap, you can refer to the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R.

Regarding your question about cell cycle genes, you can access a list of these genes at https://satijalab.org/seurat/reference/cc.genes.

In comparing TCellMap2.R and TCellMap.R, they are quite similar. TCellMap2.R, however, utilizes each batch to map the query, but some of the mapping results may not be optimal. In such cases, the MultiMap package could potentially provide better batch mapping results.

Best regards,

Does it matter how the query data are normalized? Is LogNormalize ok?

In my case Multimap predicted allmost all of the query CD4+ T cells as "CD4_c5_Tctl" cells. I don't think this worked

dustincys commented 6 months ago

Hi Siuiri,

I hope this message finds you well. As you may know, Seurat suggests utilizing SCTransform because it helps address the characteristics of the data, particularly in cases where 10x counts are more in line with a zero-inflated non-negative binomial distribution.

allmost all of the query CD4+ T cells as "CD4_c5_Tctl" cells.

I have to admit that MultiMap is no perfect. I also suggest that using the DEGs to double confirms the mapping results.

Best regards,

Conghui2023 commented 1 month ago

Hi @dustincys,

Is the reference Seurat object available now? where can I find it? And how can I install the MultipleMap package?

All the best

dustincys commented 1 month ago

Hi @Conghui2023 ,

At the bottom of this page https://singlecell.mdanderson.org/TCM/ you may find the seurat object with md5 code.

For MultiMap package, it is a R package, you could install it like this

library(devtools)
install_github("WangLab-ComputationalBiology/MultiMap")

Best

Conghui2023 commented 1 month ago

Hi, dustin

Thank you very much for your reply, I still have a question about ‘TCellMap.R', I am confused about the ‘hvgQ’ in the scripts, could you tell me what it is?

[截屏2024-07-18 10.26.12.png]

All the best, Conghui

2024年7月17日 23:48,Dustin @.***> 写道:

install_github("WangLab-ComputationalBiology/MultiMap")

Conghui2023 commented 1 month ago

Dear Dustins,

Sorry to bother you again, when I run the ‘TCellMap.R' using the 'cd8.rds’ file that you provided, I encounter many bugs, although I have fixed some of them, it still doesn’t work, could you please update this scripts?

All the best,

Conghui

2024年7月17日 23:48,Dustin @.***> 写道:

Du får ikke ofte mails fra @.*** Få mere at vide om, hvorfor dette er vigtigthttps://aka.ms/LearnAboutSenderIdentification

Hi @Conghui2023https://github.com/Conghui2023 ,

At the bottom of this page https://singlecell.mdanderson.org/TCM/https://singlecell.mdanderson.org/TCM/ you may find the seurat object with md5 code.

For MultiMap package, it is a R package, you could install it like this

library(devtools) install_github("WangLab-ComputationalBiology/MultiMap")

Best

— Reply to this email directly, view it on GitHubhttps://github.com/Coolgenome/TCM/issues/1#issuecomment-2233639981, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCBSD7NKBXBMTEF77K4IPMLZM2G4ZAVCNFSM6AAAAAAYV4FC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTGYZTSOJYGE. You are receiving this because you were mentioned.Message ID: @.***>

dustincys commented 1 month ago

Hi Conghui2023,

You can access the code template at the following link: https://github.com/WangLab-ComputationalBiology/MultiMap/blob/master/testR/test.R

Best regards,

Dear Dustins, Sorry to bother you again, when I run the 'TCellMap.R' using the 'cd8.rds' file that you provided, I encounter many bugs, although I have fixed some of them, it still doesn't work, could you please update this scripts? All the best, Conghui 2024年7月17日 23:48,Dustin *()*./\> 写道: Du får ikke ofte mails fra \()*.*/ Få mere at vide om, hvorfor dette er vigtigthttps://aka.ms/LearnAboutSenderIdentification Hi (a)> , At the bottom of this page https://singlecell.mdanderson.org/TCM/https://singlecell.mdanderson.org/TCM/ you may find the seurat object with md5 code. For MultiMap package, it is a R package, you could install it like this library(devtools) installgithub("WangLab-ComputationalBiology/MultiMap") Best — Reply to this email directly, view it on GitHub<#1 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCBSD7NKBXBMTEF77K4IPMLZM2G4ZAVCNFSM6AAAAAAYV4FC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTGYZTSOJYGE. You are receiving this because you were mentioned.Message ID: /*()*./**>