junction start coordinate bug

LieberInstitute / SPEAQeasy

SPEAQeasy: portable LIBD RNA-seq pipeline using Nextflow. Check http://research.libd.org/SPEAQeasy-example/ for an example on how to use this pipeline and analyze the resulting output files.

MIT License

6 stars 4 forks source link

The start coordinate of junctions seem to be off by 2 (i.e. using start+2 seems to produce the "correct" coordinate). Junction counts are generated by regtools in a BED file which is then parsed by bed_to_juncs.py

This line seems relevant: https://github.com/LieberInstitute/SPEAQeasy/blob/6624edc08da38ef2ebf96175d8deff305c4facce/scripts/bed_to_juncs.py#L53

BED coordinates are 0-based, end-exclusive so in general start+1 should be used to convert a BED interval to a "regular", inclusive, 1-based genomic interval; the code above subtracts 1 which might explain the -2 offset?

LieberInstitute / SPEAQeasy

junction start coordinate bug #84

85 potentially solves this with a patched version of regtools (released here: https://github.com/gpertea/regtools/releases/tag/0.5.33g ) which can now directly generate the counts file with the proper start coordinates using the newly added `-c` option.

LieberInstitute / SPEAQeasy

junction start coordinate bug #84

85 potentially solves this with a patched version of regtools (released here: https://github.com/gpertea/regtools/releases/tag/0.5.33g ) which can now directly generate the counts file with the proper start coordinates using the newly added -c option.

85 potentially solves this with a patched version of regtools (released here: https://github.com/gpertea/regtools/releases/tag/0.5.33g ) which can now directly generate the counts file with the proper start coordinates using the newly added `-c` option.