JiekaiLab / scTE

MIT License
87 stars 27 forks source link

scTE returns empty csv files #60

Open CHoeltermann opened 1 year ago

CHoeltermann commented 1 year ago

Hi & thanks for your pipeline;

I currently have the problem that scTE returns only the header and no counts.

I run the tool as follows:

scTE -i some_sorted.bam -o out -x hg38.exclusive.idx -p 10 --hdf5 False -CB CR -UMI UR

BED files are being generated and everything, and I get this output:

INFO : Calculating expression... 2023-06-02 11:54:35 INFO : Detect 0 cells expressed at least 200 genes, results output to out.csv INFO : Finished calculating expression 2023-06-02 11:54:35 INFO : Done with 0d 0h 24m 19s

A sample of my BAM file:

| 1 | 4 | * | 0 | 255 | * | * | 0 | 98 | NAAAGAAACAGCAAGAAGGATACGAATCAACAGACAAACACTGCGGCACAACGCATCAAAGAGGCGAGGGCCTTCCGGAGGACGAGGACAGAGTCTCC | !--7-7<----7A--7A--7---7-------<---<-<-A---7---7-F<--7--7A7-A---7-7-----7--777----7--7-7----7----7 | on:Z:HF1_25773:2:1101:1550:1367@1:N:0:NAAACCCT | op:Z:!--7-7<----7A--7A--7---7-------<---<-<-A---7---7-F<--7--7A7-A---7-7-----7--777----7--7-7----7----7 | RX:Z:TTGGGCGGTG | QX:Z:JAJJJJJJ<J | CR:Z:TNTGAGAAGACTGGGT | CY:Z:A!AFFJFJJJAAJJJJ | | 1 | 4 | * | 0 | 255 | * | * | 0 | 98 | NCTGGCATTGCCCACAACGACCACTATGTCAAGCTCATTTCCTGGTCAGACAACGAAAATGACGACAGCAACAGGATGGAGGACGTCGTGGCCCACAA | !7-7-7-7A------7AAJJ--77--77--FJFF-7<7-<JA7<-7---7-<FAAFA--77-<-<J--7AAAFF<--7--<-777J--7-7-77AF<- | on:Z:HF1_25773:3:1101:1306:1367@1:N:0:NCCGTATG | op:Z:!7-7-7-7A------7AAJJ--77--77--FJFF-7<7-<JA7<-7---7-<FAAFA--77-<-<J--7AAAFF<--7--<-777J--7-7-77AF<- | RX:Z:TATACGCAGT | QX:Z:A<AA-FJ-7A | CR:Z:ANGGGAGGTTCGGGCT | CY:Z:<!AAFJJJFF<JJJJF | | 1 | 4 | * | 0 | 255 | * | * | 0 | 98 | NTTAGCTGGTCAGAATGGTGCACACCAGTGGTCACAGCTAGCCGAGAGGCTGAGATGAGAGGATCGCTGGAGCGTAGGAGGGTGATGGTGCAGAGAGC | !7--7A<----7--<----77FAJ<F-<-7A<7-7A-77A-777-7-AAF-77A-AF-77A77-A-A-777-7----7--A-A<--7<-777A-7A-- | on:Z:HF1_25773:4:1101:29701:1332@1:N:0:NCTCAGTG | op:Z:!7--7A<----7--<----77FAJ<F-<-7A<7-7A-77A-777-7-AAF-77A-AF-77A77-A-A-777-7----7--A-A<--7<-777A-7A-- | RX:Z:GATTCCGTTG | QX:Z:JJFJ<7JFJ- | CR:Z:ANGGGTCGTGACGCCT | CY:Z:A!A<--FAF7AF<JFJ |

I don't know what's going wrong here, but since the data I am using is a published dataset where counts should be plenty, I assume I am making some sort of mistake.

Cheers, Charlotte

jphe commented 1 year ago

scTE identifies UMI through UR/UB tags. The bam file you provided seems to have no UMI information, so you need to set the -UMI parameter to False.

CHoeltermann commented 1 year ago

Sorry I forgot to mention!

Before analyzing with scTE, I used pysam (https://github.com/pysam-developers/pysam) to add a "UR:Z" flag (I basically copied RX:Z which is the UMI).

Does scTE have some other requirements for the BAM file or some dependency on a certain format I don't recognize?

These published bam files are supposed to be cellranger output, however I find that the bam file I downloaded and cellranger's specifications differ.

Thanks a lot & all the best, Charlotte

jphe commented 11 months ago

I have just noticed, the bam file you paste above has no chromosome and coordinate information

CHoeltermann commented 11 months ago

True, thank you for pointing out! I have reached out to the creators of the bam files to try and solve this :)