cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
173 stars 28 forks source link

output of SingleCellGenotype is empty! #61

Closed dipingxian431 closed 2 months ago

dipingxian431 commented 3 months ago

Hello,

I am using SComatic to compute germline genotypes for known variants in 10x Multiome datasets. I can get the SingleCellGenotype of scRNA-seq data. when I run SingleCellGenotype.py on scATAC-seq data. I can get the file GCIY.single_cell_genotype.tsv, but the size is 118 bites and only has the title of each column.

#CHROM Start End REF ALT_expected Cell_type_expected Num_cells_expected CB Cell_type_observed Base_observed Num_reads

The code is attached below.

SCOMATIC=/data/dowload_data/SComatic
cell_type_bam=/data/dowload_data/SComatic/FL_wdata/ATACh_results/Step1_BamCellTypes/FL.GCIY.bam
META=$SCOMATIC/FL_wdata/cell_barcode_annotationsh.tsv
REF=$SCOMATIC/GRCh38.genome.fa
python $SCOMATIC/scripts/SingleCellGenotype/SingleCellGenotype.py --bam $cell_type_bam \
    --infile /data/dowload_data/SComatic/FL_wdata/ATACh_results/Step4_VariantCalling/FL.calling.step2.tsv \
    --nprocs 16 \
    --meta $META \
    --outfile GCIY.single_cell_genotype.tsv \
    --tmp_dir temp \
    --ref $REF

Can you help me to figure out why it can not work on scATAC-seq?

Thank you so much.

ArthurDondi commented 3 months ago

Hi, in the FAQ for scATAC-seq you should use --min_mq 30 (it's true for any step using a bam, DNA aligners most of the time have a MAPQ ranging up to 60).

dipingxian431 commented 3 months ago

Dear ArthurDondi, Thank you so much. It works now.

I still have a similar question.

I also have a mouse 10x Multiome sample. when I run SingleCellGenotype.py on both scRNA and scATAC data. I can get the file single_cell_genotype.tsv, but the size is 118 bites and only has the title of each column. #CHROM Start End REF ALT_expected Cell_type_expected Num_cells_expected CB Cell_type_observed Base_observed Num_reads

The code is attached below.

#Computing the genotype for each cell at the variant sites

SCOMATIC=/data/dowload_data/SComatic
META=$SCOMATIC/FL_wdata/cell_barcode_annotationsm.tsv 
sample=FL
REF=/data/Reference/SComatic/mm10.fa
output_dir=$SCOMATIC/FL_wdata/GEXm_results
output_dir1=$output_dir/Step1_BamCellTypes
output_dir4=$output_dir/Step4_VariantCalling
STEP4_2_pass=${output_dir4}/${sample}.calling.step2.tsv

output_dir7=$output_dir/SingleCellAlleles2
mkdir -p $output_dir7

for bam in $(ls -d $output_dir1/*bam);do  
    cell_type=$(basename $bam | awk -F'.' '{print $(NF-1)}')

    temp=$output_dir7/temp_${cell_type}
    mkdir -p $temp

    python $SCOMATIC/scripts/SingleCellGenotype/SingleCellGenotype.py --bam $bam  \
        --infile ${STEP4_2_pass}   \
        --nprocs 1   \
        --meta $META   \
        --outfile ${output_dir7}/${cell_type}.single_cell_genotype.tsv  \
        --tmp_dir $temp  \
        --ref $REF

    rm -rf $temp
done

Can you help me to figure out why it can not work?

Thank you so much.

dipingxian431 commented 3 months ago

Dear ArthurDondi,

I also added --min_mq 30 when I run SingleCellGenotype.py on mouse ATAC data. It also works.

So, now only mouse RNA data does not work.

Thanks. Mingchao

ArthurDondi commented 3 months ago

Do you have variants in ${STEP4_2_pass} ? Also there seem to be a typo there: META=$SCOMATIC/FL_wdata/cell_barcode_annotationsm.tsv

I would

  1. Check if the barcodes correspond to the bam file
  2. Check if the positions are covered in the bam files, e.g. for a mutation on chr1 position 1000: samtools view bamfile.bam chr1:1000-100
dipingxian431 commented 3 months ago

Dear ArthurDondi,

Thank you so much for the suggestion.

I found the barcodes do not correspond to the bam file.

Thank you!

Mingchao

Francesc-Muyas commented 2 months ago

Dear users, Thanks for the feedback. I hope the issue was solved. Let me know if not.

Thanks, Fran