dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

STAR sortedBAM output #61

Closed bmmalone closed 1 year ago

bmmalone commented 7 years ago

From @tjakobi

STAR has two modes of output: write SAM (default) or write BAM files.

While STAR with SAM output runs very smoothly even with multiple cores an distributed over all machines (I tested with 25 instances) STAR with sorted BAM output seems to be a BeeGFS killer.

When run with "--outSAMtype BAM SortedByCoordinate" STAR puts enormous stress on the BeeGFS storage servers pushing them behind the point of maximum writes they can do per second. This yields to heavy IO wait and to an nearly unusable console since the BeeGFS servers also host the home directories (and the bash completion).

For now, my advice is: write out SAM and after the run convert to BAM. The samtools conversion does not seem to pose any problems to the cluster.

In principle, this should just entail an extra step after the call to STAR in the estimate-XXX-abundance scripts, as well as updates to the filenames.

This is exactly the same as Issue dieterich-lab/b-tea#41.