SPEAQeasy: portable LIBD RNA-seq pipeline using Nextflow. Check http://research.libd.org/SPEAQeasy-example/ for an example on how to use this pipeline and analyze the resulting output files.
The start coordinate of junctions seem to be off by 2 (i.e. using start+2 seems to produce the "correct" coordinate). Junction counts are generated by regtools in a BED file which is then parsed by bed_to_juncs.py
BED coordinates are 0-based, end-exclusive so in general start+1 should be used to convert a BED interval to a "regular", inclusive, 1-based genomic interval; the code above subtracts 1 which might explain the -2 offset?
85 potentially solves this with a patched version of regtools (released here: https://github.com/gpertea/regtools/releases/tag/0.5.33g ) which can now directly generate the counts file with the proper start coordinates using the newly added -c option.
The start coordinate of junctions seem to be off by 2 (i.e. using start+2 seems to produce the "correct" coordinate). Junction counts are generated by
regtools
in a BED file which is then parsed bybed_to_juncs.py
This line seems relevant: https://github.com/LieberInstitute/SPEAQeasy/blob/6624edc08da38ef2ebf96175d8deff305c4facce/scripts/bed_to_juncs.py#L53
BED coordinates are 0-based, end-exclusive so in general
start+1
should be used to convert a BED interval to a "regular", inclusive, 1-based genomic interval; the code above subtracts 1 which might explain the -2 offset?