filter_by_tx_chain.py - use sort_introns_by_strand via pyranges.apply for multiprocessing speed up

Currently convert to df, group by each transcript and then sort by intron position / number (1st in group = first intron)

Simple speed up would be to keep function the same but run via pyranges apply so working on multiple dfs (diff chr/strand) at once. pr.apply can return a dict (as_pyranges=False) which can be concatenated into a single df with pr.concat if want a single df at the end.

Alternatively groupby may be the bottleneck... To get the same effect, a combo of pyranges.apply (which will split dfs by chromsome and strand) and a two-factor sort_values(["transcript_id", <"Start">/<"End"> depending on strand may be quicker.

See how much speed-up with using pr.apply & concat with existing code
Test out pr.apply + sort_values

frattalab / PAPA

filter_by_tx_chain.py - use sort_introns_by_strand via pyranges.apply for multiprocessing speed up #1