YangLab / SCAPTURE

Other
18 stars 4 forks source link

How to optimize the BAM file for SCAPTURE analysis #6

Closed ysbioinfo closed 2 years ago

ysbioinfo commented 3 years ago

Hi Guowei, Thanks for developing such an awesome tool! I want to use it in my analysis and have two naive questions:

  1. As Cellranger filters the "true" cell barcodes using a "knee plot", the number of cell barcodes in the original BAM file is usually much more than the number (~ 5000-10000) we finally used. So do I need to manually filter the BAM file to only include those unfiltered cells?
  2. Some celltypes, like the malignant epithelial cells, only accounts for a small portion of cells. For example, in one of my samples, there are totally 8000 cells but only ~300 epithelial cells. I’m concerned that the peak of such few cells would be diluted if I analyze all 8000 cells together. Should I split the BAM file by celltype and analyze each of them individually?

Many thanks in advance!

Best Yang

liguowei-CAS commented 2 years ago

Hi Yang,

  1. In PASquant step, SCAPTURE requires "--celllist" for cell barcode file as input (one barcode per line), and then SCAPTURE automaticly filters the input BAM file. The barcode file could easily be generated from cellranger output though: zcat outs/filtered_feature_bc_matrix/barcodes.tsv.gz > celllist.txt

  2. Currently, these public single-cell PAS tools take the BAM file of all cells to identify PASs. You can split the BAM file by celltype to run PAScall module individually, and then merge PAS files with PASmerge module. In such a case, a manual analysis of PAS evaluation is recommended.

Best Guo-Wei

ysbioinfo commented 2 years ago

Many thanks for your help guowei! Another question, I noted that in your guideline, 3 files are generated in PAScall step: intronic, exonic and 3primeextended. But when you select the positive PAS, only intronic and exonic bed are used. `#Select PASs with positive prediction

With input polyaDB

perl -alne '$,="\t";print @F[0..11] if $F[12] > 0 | $F[13] eq "positive";' PBMC_ALL.exonic.Integrated.bed PBMC_ALL.intronic.Integrated.bed > PBMC_ALL.PASquant.bed

Or, without input polyaDB

perl -alne '$,="\t";print @F[0..11] if $F[13] eq "positive";' PBMC_ALL.exonic.Integrated.bed PBMC_ALL.intronic.Integrated.bed > PBMC_ALL.PASquant.bed`

Is it just a typo, or 3primeextended is not recommended for downstream analysis?

liguowei-CAS commented 2 years ago

The PASs locating at downstream of gene annotation have not been evaluated in our study.

ysbioinfo commented 2 years ago

Got it. Thanks!