cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
145 stars 20 forks source link

When running scATAC data, No temporary files found in BaseCellCounter.py #29

Closed nieyage closed 10 months ago

nieyage commented 10 months ago

Hello, I am currently running SComatic with scATAC data, but I encountered some issues when running the second step, BaseCellCounter.py. Here is my code and the output:

REF=/md01/nieyg/ref/hard-mask/mm10_hard_masked/fasta/genome.fa
SCOMATIC=/md01/nieyg/software/SComatic/
output_dir=/md01/nieyg/project/lineage_tracing/heart_regeneration/00_data/AR3_data/scATAC/AR3_C4_scATAC_add500G/SComatic/
output_dir1=$output_dir/01_SplitBamCellTypes
output_dir2=$output_dir/02_BaseCellCounts
  # Cell type
  cell_type="FB"
  # Temp folder
  temp=$output_dir2/temp_test_${cell_type}
  # Command line to submit to cluster
  python $SCOMATIC/scripts/BaseCellCounter/BaseCellCounter.py --bam ./01_SplitBamCellTypes/AR3_C4_scATAC.FB.bam \
    --ref $REF \
    --chrom all --min_dp 1 \
    --out_folder $output_dir2 \
    --min_bq 30 \
    --id AR3_C4_scATAC \
    --tmp_dir $temp \
    --nprocs 20

output:

Outfile /md01/nieyg/project/lineage_tracing/heart_regeneration/00_data/AR3_data/scATAC/AR3_C4_scATAC_add500G/SComatic//02_BaseCellCounts/AR3_C4_scATAC.tsv
Directory  /md01/nieyg/project/lineage_tracing/heart_regeneration/00_data/AR3_data/scATAC/AR3_C4_scATAC_add500G/SComatic//02_BaseCellCounts/temp_test_FB  already exists
No temporary files found
Computation time: 314 seconds

2

I referred to this link for an answer, and here is the output for my BAM files.

A00265:1193:H32KVDSX7:1:1578:22670:27853 163 chr1 3000022 60 103M47S = 3000068 196 AATTTGAGGAGAGTTGGAATTAGGTCTTCTTTGAAGGTCTGGTAGAACTCTGCATTAAACCCATCTGGTCCTGGGCTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTGGGATAATATTGATTAATGCCAAAATTTATTTTGGGTAAATGGG FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,,F:,,,:F,,,:FF,,F,F,,,:,,:,,:F,,:F:,FFF,FF,:,,, NM:i:1 MD:Z:1C101 AS:i:101 XS:i:88 XA:Z:chr15,-5036583,40S8M1D102M,4;chr7,-62865006,47S86M17S,1; CR:Z:AGATAGATCGATGCAT CY:Z:FFFFFFFFFFFFFFFF CB:Z:AGATAGATCGATGCAT-1 RG:Z:AR3_C4_scATAC_add500G:MissingLibrary:1:H32KVDSX7:1 A00265:1193:H32KVDSX7:1:1578:22670:27853 83 chr1 3000068 60 150M = 3000022 -196 AATCTTCATTAAACCCCTCTGGTACTGGGCTTTTTTTTTTTTTTTTTTTTTTTTTTGGTTGGGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTTTAGTCCATGAATCTGATCCTGATTTAGCTTTGGTACCTGGT F,:,F,,F::FFFFF,:,,,,F:,,,F:F,,FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:5 MD:Z:1C3G10A6C34G91 AS:i:129 XS:i:83 CR:Z:AGATAGATCGATGCAT CY:Z:FFFFFFFFFFFFFFFF CB:Z:AGATAGATCGATGCAT-1 RG:Z:AR3_C4_scATAC_add500G:MissingLibrary:1:H32KVDSX7:1 A00265:1193:H32KVDSX7:1:1607:13268:19836 99 chr1 3000589 60 45M = 3000589 45 ATACTCTAGTTTCCTTTTGGAGGCACACAGGCCTGTGAGTTTTAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:45 AS:i:45 XS:i:25 CR:Z:GTTACGAAGAGGAAGT CY:Z:FFFFFFFFFFFFFFFF CB:Z:GTTACGAAGAGGAAGT-1 RG:Z:AR3_C4_scATAC_add500G:MissingLibrary:1:H32KVDSX7:1 TR:Z:CTGTCTCTTATACACATCTCCGAGCCCACTAGACACATTGGCATCTCGTATTCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG TQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FF,F,,:,,:,:,,F,,,F,F:,,,,F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

I am sure that I am using the same reference genome, and there were no issues when running the example data. Can you provide me with some suggestions? Thank you very much!

Francesc-Muyas commented 10 months ago

Dear user,

Thanks for bringing up this question.

ATAC-seq data is a DNA-based approach. Therefore, there are differences in the way of processing the bam files.

For instance, mapping quality filters (and others) are different. Could you take a look at this FAQs section and let me know if you still have problems?

Cheers, Fran