colomemaria / epiAneufinder

R package to detect breakpoints and assign somies to scATAC-seq data
GNU General Public License v3.0
33 stars 7 forks source link

Question related to publication #21

Closed Krithika-Bhuvan closed 5 months ago

Krithika-Bhuvan commented 7 months ago

Hello, Thanks for creating this package.

In your paper, you have "Fig. 4: Copy number alterations in a primary patient sample." where Figure 4c is "Correspondence between karyotype clones from (a) and cell types from (b). ". Can you please explain what steps you took did to connect your karyotype clones with cell types ?

I've used your package to estimate copy number in my scATACseq data. I'm looking to connect it back with my immune cell estimates from scRNA-seq data. Any advice would helpful. Thanks. K

thek71 commented 7 months ago

Hi Krithika,

briefly we used gene activity to calculate the Leiden clusters that separate the different cell types, using gene markers. It is standard from 10x that the fragment file contains the barcode in the name of the cell and this information is passed in the results of epiAneufinder. Then we just used the barcodes to connect the CNV state of the cell to the cell type from the Leiden clustering. The scripts used for the calculations can be found here https://github.com/colomemaria/epiAneufinder_analyses, under the embedding_scripts folder. If you have multiome data then you can directly connect the two information types directly based on the cell barcodes. If you have separate scATAC and scRNA measurements there are two different ways to go. Either use the scATAC and same as in the publication use gene activity to identify the different cell types or integrate the two modalities and get the cell type information from there.

Best, Katia

Krithika-Bhuvan commented 5 months ago

Hello, I have a follow up question. I want to take the copy number data generated by epiAneufinder , and save it into a seurat object and integrate it with my other data types. This will allow me to create a UMAP. Based on what I see so far, in order to plot the copy number data on the UMAP, I need to "collapse" the data to get one number for each barcode. Wondering if your team has done this type of analysis before, and how ? Please advise.

thek71 commented 5 months ago

Hi Krithika, what we regularly do is to add the subclone information as metedata attribute in the Seurat or scanpy/epiScanpy object. You can use the epiAneufinder function "split_subclones" to get the information of which cell/barcode belongs to each of the clones. The output of the command is a dataframe that you can then use for your downstream analysis. You can find a detailed tutorial in the vignette.

I hope that helps. Best, Katia

Krithika-Bhuvan commented 5 months ago

Hi Katia,

Thank you very much for the feedback . That is super helpful. I found the vignette that can identify the subclonal information. If you would be willing to share code or example of how to add the subclonal information into the metadata of the seurat object, that would be extremely helpful and i'm quite new to all of this. Thank you, Best , K

thek71 commented 5 months ago

Hi Krithika,

you can use the AddMetaData function https://www.rdocumentation.org/packages/Seurat/versions/3.1.4/topics/AddMetaData The split_subclone function returns a dataframe with two columns, the "cell" and ''subclone". The dataframe has also an index that is the cell barcode. Since the row.names in the dataframe are the barcodes, I think you can directly use the AddMetaData function. There might be an error, if you have different barcodes number of barcodes between the two object, so then you will have to filter the extra cells from the clone file or if your Seurat object has more cells than the clones file you can add an "NA" to the these cells.

I hope this is helpful enough.

Best, Katia

Krithika-Bhuvan commented 5 months ago

Hi Katia @thek71, Thank you very much for the feedback. I am attempting this right now and its going alright so far. I have multiple samples, so I have a few more questions related to its interpretation and merging of data across samples

thek71 commented 5 months ago

Hi Krithika,

the clones/clusters are per sample, you are right. The identification of the clones/clustres relies a lot on the cutoffs used. It's basically a distance metric that is taken into account and cells that "look like" each other, i.e. have similar CNVs, are considered to be of the same clone/cluster. But as the user specifies how the tree of the clusters will be partitioned, it's not an automated process. About the interpretation, that is up to you/the dataset and this is not a topic for this platform.

I hope I answered your questions.

Best, Katia

Krithika-Bhuvan commented 5 months ago

Yes thank you very much !