gagneurlab / FRASER

FRASER - Find RAre Splicing Events in RNA-seq
MIT License
36 stars 20 forks source link

Error in `[[<-`(`*tmp*`, name, value = "Donor") #19

Closed deb0612 closed 2 years ago

deb0612 commented 3 years ago

When doing the count split read by using our own bam file. I use the command as follow:

settings <- FraserDataSet(colData=anno, workingDir="../") fds <- countRNAData(settings)

Tue Dec 1 06:50:58 2020: Start counting the split reads ... Tue Dec 1 06:50:58 2020: Count split reads for sample: RNA-JXY-001A Tue Dec 1 06:57:31 2020: Count split reads for sample: RNA-YCS-004A Tue Dec 1 07:04:45 2020: Count split reads for sample: RNA-CSW-006A Tue Dec 1 07:13:19 2020: Count split reads for sample: RNA-WCJ-007A Tue Dec 1 07:18:48 2020: Count split reads for sample: RNA-SYL-009A Tue Dec 1 07:25:06 2020 : count ranges need to be merged ... Tue Dec 1 07:25:06 2020: Create splice site indices ... Tue Dec 1 07:25:06 2020: Writing split counts to folder: ..//savedObjects/Data_Analysis/splitCounts Tue Dec 1 07:25:06 2020: Identifying introns with read count <= 20 in all samples... Tue Dec 1 07:25:06 2020: Start counting the non spliced reads ... Tue Dec 1 07:25:06 2020: In total 0 splice junctions are found.

Error in [[<-(*tmp*, name, value = "Donor") : 1 elements in value to replace 0 elements

It shows error like this.

c-mertes commented 3 years ago

This looks like your bam files do not have any splice junction in it as it detected 0 junctions.

Can you run this command to see if junctions are detected in one of your bam files?

library(GenomicAlignments)
# exchanges it with your own bam file
bamFile <- system.file("extdata/bam/sample1.bam", package = "FRASER")
# adapt it to your genome assembly. e.g. "chr19" vs "19"
param <- ScanBamParam(which=GRanges("chr19", IRanges(1, 298022430)))
ga <- readGAlignments(bamFile, param=param)
ga
junctions <- summarizeJunctions(ga)
junctions

Running this on the example data you should see this:

> junctions
GRanges object with 20 ranges and 3 metadata columns:
       seqnames          ranges strand |     score plus_score minus_score
          <Rle>       <IRanges>  <Rle> | <integer>  <integer>   <integer>
   [1]    chr19 7471938-7607808      * |         1          1           0
   [2]    chr19 7506652-7506797      * |         2          0           2
   [3]    chr19 7506842-7710070      * |         1          0           1
   [4]    chr19 7506843-7710782      * |         1          0           1
   [5]    chr19 7590053-7591324      * |        39         23          16
   ...      ...             ...    ... .       ...        ...         ...
  [16]    chr19 7594599-7595320      * |         1          1           0
  [17]    chr19 7595388-7598408      * |        25         11          14
  [18]    chr19 7598540-7598644      * |        23         11          12
  [19]    chr19 7607815-7607892      * |         1          1           0
  [20]    chr19 7607971-7614792      * |         1          1           0
  -------
  seqinfo: 3 sequences from an unspecified genome
c-mertes commented 2 years ago

Since there is no response to this issue I will close it. Please feel free to reopen it if needed.

liliiademianenko commented 1 year ago

Hello! I have the same error... how did you preprocess your bam files?

counts

pds <- countRNAData(settings, recount = TRUE)

Thu May 18 22:29:51 2023: Start counting the split reads ... Thu May 18 22:29:51 2023: Count split reads for sample: SRR5361431_control_short Thu May 18 22:30:07 2023: Count split reads for sample: SRR5361438_control_short Thu May 18 22:30:13 2023: Count split reads for sample: SRR5361440_schizoph_short Thu May 18 22:30:20 2023 : count ranges need to be merged ... Thu May 18 22:30:21 2023: Create splice site indices ... Thu May 18 22:30:21 2023: Writing split counts to folder: FRASER_output/savedObjects/Data_Analysis/splitCounts Thu May 18 22:30:21 2023: Identifying introns with read count <= 20 in all samples... Thu May 18 22:30:21 2023: Start counting the non spliced reads ... Thu May 18 22:30:21 2023: In total 0 splice junctions are found. Error in [[<-(*tmp*, name, value = "Donor") : 1 elements in value to replace 0 elements

ValentinV-GAD commented 5 months ago

**[EDIT : I think I found the solution to this. This is totally my fault : to be short there was indeed no junctions in my bam files, as the error message literally suggests. I will let it here for reference if I am indeed right :

Simply the bamfiles I used were passed in SplitNCigarReads from GATK, and I was unaware of this at first. This tool ruins any chance to use reads for detecting junctions. It cut reads at junctions and more specifically it eliminates among other things N in CIGARs which I think serves as marking junctions by the GenomicAlignment package. This latter package is used in FRASER to identify junctions if I am not mistaken. So there is a good chance that this is why I have no junctions detected during the construction of FraserDataSet objects using this particular bam files.

END OF EDIT]**

Greetings,

I also would be interested in the reopening of this subject, as I had the exact same error too during the counting:

Capture d’écran du 2024-03-13 10-49-47

And as you can see there are also 0 splice junctions found. So I did the test as indicated above with one of my samples. I do have a non-empty output for readGAlignemnts ("ga" object in your post):

Capture d’écran du 2024-03-13 10-46-28

but the object junctions is indeed empty : GRanges object with 0 ranges and 3 metadata columns: seqnames ranges strand | score plus_score minus_score

| seqinfo: 25 sequences from an unspecified genome I don't know yet why no junctions are detected in the bam files. I am looking into it but there seems to be soft clipped CIGAR reads but no N in CIGARs. so GenomicAlignments may need N in CIGARs to identify junctions, and this may be related to the way the alignment was performed... Thank you for your help.