davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
201 stars 112 forks source link

intron coordinates for extracting adjacent exons #166

Open kwcurrin opened 3 years ago

kwcurrin commented 3 years ago

Hello,

I am attempting to retrieve the exons on either side of introns with significant splice QTLs. I am running into an issue where I have to subtract 1 from both the start and stop coordinates in order to retrieve exons. I am using intron coordinates derived from the leafcutter bed2junc.pl script. It unfortunately wouldn't be feasible to rerun the splice QTL analysis using the regtools scripts because too many downstream analyses have been performed.

I am using BEDTools closest to extract adjacent exons and then filtering to exons 1bp upstream/downstream of the introns: closestBed -io -D ref \ -a introns.bed -b ${gtf_file} \ | awk 'BEGIN {FS = "\t"}; {if(($14 == 1) || ($14 == -1)) print $0}'

I am using the GENCODE human v19 GTF. When I use the intron coordinates directly to make the introns.bed file, I get only 70 exons adjacent to the introns, 30 upstream, 40 downstream. This is out of over 6,000 introns. However, if I subtract 1 from both the start and stop positions when making the bed file, I retrieve both upstream and downstream exons for a majority (over 6,000) of the introns.

I experimented with adding or subtracting 1 from the start and stop coordinates separately, but only subtracting 1 from both coordinates worked.

Is this a known issue with the old junction scripts?

Thanks!

Kevin Currin

jackhump commented 3 years ago

yes this is a known issue. This is due to GTF and BED file formats having different specifications for genomic intervals being either 0-based or 1-based - I can never remember which way round that is.

It's an annoying aspect of bioinformatics I'm afraid.

kwcurrin commented 3 years ago

BEDTools correctly handles conversions between the 1-based GTF files and 0-based BED files. Is the start position of the intron (with the old leafcutter junction scripts) 0 or 1 based? I just want to make sure I am making the intron bed file correctly.

Thanks!

Kevin