PoolLab / generecovery

29 stars 4 forks source link

error in "Coerces GRanges object into a GAlignments objectusing", script 1_Annotation pre-processing.R. #8

Open DiegoSafian opened 1 year ago

DiegoSafian commented 1 year ago

Hi,

I got an error in the "Coerces GRanges object into a GAlignments objectusing" using the script 1_Annotation pre-processing.R.

> library("GenomicRanges")
> library("rtracklayer") # For coercing data to bed format
> library("IRanges")
> bamfile = file.path("/camp/home/my_data/possorted_genome_bam.bam") # Add the location of the Cell Ranger generated bam file containing transcriptome aligned reads with the unoptimized reference
> indexfile = file.path("/camp/home/my_data/possorted_genome_bam.bam") #Note, you don't have to specify ".bai" extension here.
> seq_data = readGAlignments(bamfile, index=indexfile, param = ScanBamParam(flag = scanBamFlag(isDuplicate = FALSE, isSecondaryAlignment = FALSE), tag = c("GN", "RE", "CB", "UB", "AN"), what = "flag", tagFilter = list("RE"=c("I", "E")))) # Extract all non-duplicate intergenic and exonic sequening reads with the following bam tags: "GN" - aligned gene; "RE" - read classification into E-exonic, N-intronic, I-intergenic; "CB" - corrected cellular barcode; "UB" - corrected UMI/molecular barcode, "AN" - antisense gene.
> seq_data = data.frame(seq_data)
> ## Keep only intergenic reads by removing all true exonic reads (i.e. remove exonic reads that lack antisense gene mapping: AN tag =NA)
> intergenic_reads = seq_data$RE=="I"
> false_exonic_reads = !is.na(seq_data$AN)
> all_intergenic_reads = as.logical(intergenic_reads + false_exonic_reads)
> seq_data = seq_data[all_intergenic_reads,] # remaining dataframe contains only intergenic reads. Note, that we are assuming that all false exonic reads are intergenic, which slightly overestimates intergenic read count. This is since some will likely also end up being intronic.
> library(stringr)
> seq_data$CB = str_sub(seq_data$CB, end=-3) # Remove last two elements of the cell barcode. This is an artifact ("-1") added by Cell Ranger software.
> seq_data$barcodes = paste(seq_data$CB, seq_data$UB, sep="") # Assemble the cell barcode / molecular barcode list. Each read included in the gene_cell matrix will have a unique index comprised of the two.
> a = nchar(seq_data$barcodes)==26 # logical vector for selecting reads with non-corrupt barcodes
> seq_data = seq_data[a,] # exclude all reads that don't have an intact full cellular and molecular barcodes
> length(unique(seq_data$barcodes)) # Determine # of unique intergenic reads
[1] 0
> seq_data = seq_data[!duplicated(seq_data$barcodes),] # exclude all duplicated intergenic reads
> ## Save extracted intergenic reads as a separate file
> gr_seq_data = makeGRangesFromDataFrame(seq_data) # coerce to granges object as that makes it possible to save it as a bedfile that bedtools can parse
> #### save.image("/camp/home/safiand/home/users/safiand/genome_annotation/ReferenceEnhancer/data.RData")
> ga_seq_data = as(gr_seq_data, "GAlignments") # Coerces GRanges object into a GAlignments object, that can be saved as a bed file. Required for bedtools to link reads to closest 3' gene end.
Error in validObject(.Object) : invalid class “GAlignments” object: 
    'x@cigar' is not parallel to 'x'

Could you help me to solve it?? Kind regards, Diego