fumi-github / rat_singlecell_liver_ArchR

MIT License
1 stars 0 forks source link

How did the ArrowFiles created by createArrowFiles() using bam files? #1

Closed KOBE24DUNK closed 1 year ago

KOBE24DUNK commented 1 year ago

Hi Fumi,

Thanks for your research and datasets, which are definitely my interest. But when I was trying to running the analysis on the five sn-ATAC-seq datasets, I met problems when creating the ArrowFiles using the bam files. The five bam samples are: m154207,m154211,m167108,m168101,m167203, but I'm not sure how they could be correctly read into R for creating ArrowFiles.

The file is like this:

samtools view SHR_m167108_sub.sorted.noDups.filt.noMT.bam | head -n 2

NB501731:604:HVTCJBGXC:3:13402:9280:11874 163 chr1 34 1 24M = 56 73 AGCAGAAGCTCATCTGAATATGCT AAAAAEEEEEEEEEEEEEEEEEEE MD:Z:24 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:0 YS:i:0 YT:Z:CP NB501731:604:HVTCJBGXC:3:13402:9280:11874 83 chr1 56 1 51M = 34 -73 CTCAAGGATGCTGACATCAACATTTAATCATCTCCTCACTCATCCAGGAAG EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:51 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:0 YS:i:0 YT:Z:CP

I didn't find the usual tag or header for the bam files, therefore I don't know how to process them next, like some recommended standardized workflows:

bamFile <- "test1_2_10000.bam"
outFile <- paste0("test1_2_10000", ".fragments.tsv.gz") #keep latter half
offsetPlus <- 4
offsetMinus <- -5
bamFlag <- scanBamFlag(isMinusStrand = FALSE, isProperPair  = TRUE)scanFragments <- scanBam(bamFile,
      param = ScanBamParam(
        flag = bamFlag,
        what = c("rname","pos", "isize"),
        tag = c("XB", "DB"),

I'm sorry I'm relatively new to this. If the above is not the case, did you read them directly for ArrowFiles? Like this:

inputFiles = '/path/alignments.possorted.tagged.bap.bam'
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = 'sortedBam',
  filterTSS = 4, 
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE

It would be super nice if you could pinpoint the key information for me or share me the code for this step. Thank you very much for your time!

fumi-github commented 1 year ago


We performed bulk and single-nucleus ATAC-seq in this study. The BAM file you mentioned is for bulk ATAC-seq.

Please download the sra files for the single-nucleus ATAC-seq from here: DRR394952 DRR394953 DRR394954 DRR394955 DRR394956

You can extract SAM file from the sra file by using the sam-dump command of sra-tools, and then convert to BAM file using samtools.