Estimate of processing time

Hi, I have been splitting the input SAM into batches, running the collapse, and merging them as suggested. I do have a couple more questions regarding this workflow, hope you can help me understand it better.

1) My input SAM has 36.3 million reads and comes from a target capture enrichment of select regions on a particular chromosome.
From the code, it looks like the splits are chromosome-wise and the resulting 17 uneven splits are hence understandable. I have the bulk of the reads on one particular chromosome and TAMA collapse faces a memory error when processing that particular split *R1_5.sam. Subsetting the BAM to this one chromosome wouldn't help as I will get only split file based on the chromosome. Is there a way I can process them in batches of a million and put them back together with TAMA merge reliably?

Sizes of split files:

ETA: Defined split regions based on my data - non overlapping coordinates in my region of interest and tried TAMA collapse. Split data size in the order of 200-500k reads, still time-consuming to run collapse

2) The TAMA merge process outputs an empty *_trans_read.bed. I don't see it mentioned as one of the outputs in the TAMA merge documentation as well. Is there a way to obtain this file that usually maps the read IDs to the transcript model IDs Example: From Col4 of trans_read.bed G2.3;59:1307|fcf8906e-166e-471c-9d5c-e3758e2e80c0

GenomeRIK / tama

Estimate of processing time #117