fumi-github / rat_singlecell_liver_ArchR

MIT License
1 stars 0 forks source link

How did the ArrowFiles created by createArrowFiles() using bam files? #1

Closed KOBE24DUNK closed 1 year ago

KOBE24DUNK commented 1 year ago

Hi Fumi,

Thanks for your research and datasets, which are definitely my interest. But when I was trying to running the analysis on the five sn-ATAC-seq datasets, I met problems when creating the ArrowFiles using the bam files. The five bam samples are: m154207,m154211,m167108,m168101,m167203, but I'm not sure how they could be correctly read into R for creating ArrowFiles.

The file is like this:

samtools view SHR_m167108_sub.sorted.noDups.filt.noMT.bam | head -n 2

NB501731:604:HVTCJBGXC:3:13402:9280:11874 163 chr1 34 1 24M = 56 73 AGCAGAAGCTCATCTGAATATGCT AAAAAEEEEEEEEEEEEEEEEEEE MD:Z:24 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:0 YS:i:0 YT:Z:CP NB501731:604:HVTCJBGXC:3:13402:9280:11874 83 chr1 56 1 51M = 34 -73 CTCAAGGATGCTGACATCAACATTTAATCATCTCCTCACTCATCCAGGAAG EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:51 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:0 YS:i:0 YT:Z:CP

I didn't find the usual tag or header for the bam files, therefore I don't know how to process them next, like some recommended standardized workflows:

bamFile <- "test1_2_10000.bam"
outFile <- paste0("test1_2_10000", ".fragments.tsv.gz") #keep latter half
offsetPlus <- 4
offsetMinus <- -5
bamFlag <- scanBamFlag(isMinusStrand = FALSE, isProperPair  = TRUE)scanFragments <- scanBam(bamFile,
      param = ScanBamParam(
        flag = bamFlag,
        what = c("rname","pos", "isize"),
        tag = c("XB", "DB"),
     ))[[1]]

I'm sorry I'm relatively new to this. If the above is not the case, did you read them directly for ArrowFiles? Like this:

inputFiles = '/path/alignments.possorted.tagged.bap.bam'
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = 'sortedBam',
  filterTSS = 4, 
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

It would be super nice if you could pinpoint the key information for me or share me the code for this step. Thank you very much for your time!

fumi-github commented 1 year ago

Hi KOBE24DUNK,

We performed bulk and single-nucleus ATAC-seq in this study. The BAM file you mentioned is for bulk ATAC-seq.

Please download the sra files for the single-nucleus ATAC-seq from here: DRR394952 DRR394953 DRR394954 DRR394955 DRR394956

You can extract SAM file from the sra file by using the sam-dump command of sra-tools, and then convert to BAM file using samtools.