JiekaiLab / scTE

MIT License
87 stars 27 forks source link

Error using bam file from CellRanger #37

Open BrunoGuillotin opened 2 years ago

BrunoGuillotin commented 2 years ago

Hi, While trying to use this very nice package but I encounter an error apparently due to the bam file I use I am using cellranger/5.0.1 to generate the bam file and get the error while running scTE: ERROR : The input file /scratch/Maize_Primary_Align/outs/possorted_genome_bam.bam has no cell barcodes information, plese make sure the aligner have add the cell barcode key, or set CB to False

I have made sure that my option were set as --hdf5 True -CB CB -UMI UB

I noticed that my bam file has extra info in it compare to the example on the scTE github page especially 4 additional info between RG: (RG:Z:Maize_Primary_Align:0:1:H5GLJDRX2:2) and RE (RE:A:E) they are: TX:Z:Zm00001d027288_T001,+617,91M | GX:Z:Zm00001d027288 | GN:Z:Zm00001d027288 | fx:Z:Zm00001d027288

Do you think that could be the issue of the error I am getting ?

For the rest I have all the info including CB and UB so the cell barcodes are in the bam file.

What would be the best way to remove these additional info ? Thank you to all users in advance,

B

jphe commented 2 years ago

Hi,

The extra information should have no influence

But can you check if the bam file has empty CB reads? like this: https://github.com/JiekaiLab/scTE/issues/5

As if you set -CB CB, while some reads has no CB:Z tag, scTE will report such warning.

BrunoGuillotin commented 2 years ago

Hi jphe, thanks for your fast reply I tried to remove the CB:Z tag using your code samtools view new/D0_2Aligned.sortedByCoord.out.bam -h | awk '/^@/ || /CB:/' |samtools view -h -b > new/D0_2_CB_clean.bam and then run again scTE -i D0_2_CB_clean.bam -o /scratch/ -x /scratch/Maize_scTE_Genome.exclusive.idx --hdf5 True -CB CB -UMI UB but still get the same error see below


DEBUG   : Creating converter from 7 to 5 DEBUG   : Creating converter from 5 to 7 DEBUG   : Creating converter from 7 to 5 DEBUG   : Creating converter from 5 to 7 INFO    : Parameter list: Sample = /scratch/bg93/ Reference annotation index = /scratch/bg93/Maize_scTE_Genome.exclusive.idx Minimum number of genes required = 200 Minimum number of counts required = None Number of threads = 1   INFO    : Loading the genome annotation index... 2022-05-25 11:51:07 INFO    : Loaded '/scratch/bg93/Maize_scTE_Genome.exclusive.idx' binary file with 358608 items INFO    : Finished loading the genome annotation index... 2022-05-25 11:51:10   INFO    : Processing BAM/SAM files ...2022-05-25 11:51:10 /bin/sh: 1: samtools: not found ERROR   : The input file /scratch/bg93/D0_2_CB_clean.bam has no cell barcodes information, plese make sure the aligner have add the cell barcode key, or set CB to False ['1', '10', '2', '3', '4', '5', '6', '7', '8', '9']

Could it be the /bin/sh: 1: samtools: not found ?

Thank you very much in advance for your help, B.

BrunoGuillotin commented 2 years ago

Hi jphe,

The bam file from cellranger did need to have the empty CB reads removed. My issue was the samtools was not setup with Singularity and conda environment that was created to run scTE. Now it works like a charm.

Thanks for building this tool, I am deeply looking forward to analyse the results. B.

Jingheng-ZHANG-1999 commented 1 year ago

Hi jphe, Thanks alot for building this wonderful tool in advance! After implementing samtools view $in -h | awk '/^@/ || /CB:/' |samtools view -h -b > $out to every bam file I generated from CellRanger, to a few files, scTE reports .......has no UB:Z information, plese make sure the aligner have add the UMI key, or set UMI to False. I assume this is due to absence of UMI of some lines in the input bam file, and ran samtools view $in -h | awk '/^@/ || /UB:/' |samtools view -h -b > $out which worked. Post it here for your reference.