LieberInstitute / SPEAQeasy

SPEAQeasy: portable LIBD RNA-seq pipeline using Nextflow. Check http://research.libd.org/SPEAQeasy-example/ for an example on how to use this pipeline and analyze the resulting output files.
http://lieberinstitute.github.io/SPEAQeasy
MIT License
6 stars 4 forks source link

junction start coordinate bug #84

Closed gpertea closed 2 years ago

gpertea commented 2 years ago

The start coordinate of junctions seem to be off by 2 (i.e. using start+2 seems to produce the "correct" coordinate). Junction counts are generated by regtools in a BED file which is then parsed by bed_to_juncs.py

This line seems relevant: https://github.com/LieberInstitute/SPEAQeasy/blob/6624edc08da38ef2ebf96175d8deff305c4facce/scripts/bed_to_juncs.py#L53

BED coordinates are 0-based, end-exclusive so in general start+1 should be used to convert a BED interval to a "regular", inclusive, 1-based genomic interval; the code above subtracts 1 which might explain the -2 offset?

gpertea commented 2 years ago

85 potentially solves this with a patched version of regtools (released here: https://github.com/gpertea/regtools/releases/tag/0.5.33g ) which can now directly generate the counts file with the proper start coordinates using the newly added -c option.