Open Caffeinated-Code opened 9 months ago
Hi, I have been splitting the input SAM into batches, running the collapse, and merging them as suggested. I do have a couple more questions regarding this workflow, hope you can help me understand it better.
1) My input SAM has 36.3 million reads and comes from a target capture enrichment of select regions on a particular chromosome.
From the code, it looks like the splits are chromosome-wise and the resulting 17 uneven splits are hence understandable.
I have the bulk of the reads on one particular chromosome and TAMA collapse faces a memory error when processing that particular split *R1_5.sam.
Subsetting the BAM to this one chromosome wouldn't help as I will get only split file based on the chromosome.
Is there a way I can process them in batches of a million and put them back together with TAMA merge reliably?
Sizes of split files:
ETA: Defined split regions based on my data - non overlapping coordinates in my region of interest and tried TAMA collapse. Split data size in the order of 200-500k reads, still time-consuming to run collapse
2) The TAMA merge process outputs an empty *_trans_read.bed. I don't see it mentioned as one of the outputs in the TAMA merge documentation as well. Is there a way to obtain this file that usually maps the read IDs to the transcript model IDs Example: From Col4 of trans_read.bed G2.3;59:1307|fcf8906e-166e-471c-9d5c-e3758e2e80c0
Hi, I am hoping to get an idea of how long the processing times are for TAMA collapse. My input data with ~3.5 M Nanopore reads has been processing for close to 24 hrs now. Wondering if it usually takes this long and if there are options to expedite this.
ETA: A downsample of ~400k reads took ~93hrs to process. Are these processing times expected?
Best, Swathi