JiekaiLab / scTE

MIT License
87 stars 27 forks source link

Questions regarding preprocessing and integration of scTE with other tools #92

Open zerostwo opened 1 month ago

zerostwo commented 1 month ago

Hello!

When I was using scTE, I encountered some problems and I wonder if I can get some advice.

Question 1: Preprocessing BAM Files with Filtered Barcodes from CellRanger Output

I'm exploring the utilization of scTE from single-cell data and have a query regarding preprocessing. Is it possible to preprocess BAM files generated by CellRanger using the quantified filtered barcodes prior to running the scTE pipeline? I'm interested in understanding the feasibility and any specific steps or considerations for this approach.

Question 2: Setting Parameters for Large Cell Counts

In scenarios where the cell count exceeds 10,000, what would be the recommended parameter setting for "--expect-cells"? Should this value be derived from CellRanger's quantification results, or are there alternative methods for determining this parameter?

Question 3: Integration of CellBender with scTE for Samples with Ambient RNA

For samples sequenced using single-nucleus approaches such as heart or adipose tissue, where ambient RNA may be present, the use of tools like CellBender becomes necessary for its removal. How can the workflow of cellbender be integrated with scTE for a seamless analysis? Are there any suggested practices or considerations for this integration?

Your insights and guidance on these matters would be greatly appreciated. Thank you for your assistance!

jphe commented 1 month ago

Q1. Yes, you can use the BAM file generated by CellRanger as in input for scTE, just need to make sure all the reads has both cell barcode and UMI barcode, or you need to filter those reads before run scTE.

Q2: You can use the expected cell number outputed by CellRgner. Or you also can set the --expect-cells to a large number and then filtered the low quality cells in the down-stream analysis such as scanpy and seurat. One thing need to mention is, the larger number you set takes larger memory and longer time to run.

Q3: I'm not familiar with CellBender, so to be honest I don't known. One advice, you can run CellBender with the expression matrix generated by scTE, CellRanger and STARsolo, and do a comparison analysis if there are any bias for genes for scTE compared to other two tools. If the gene pattern are similar, then TEs are considered trustworthy based on preliminary assessment.