Closed jonperdomo closed 2 weeks ago
I test with a GTEx RNA-seq file GTEX-14BMU-0526-SM-5CA2F_rep.FAK93376.bam
and compared results with RSeQC. RSeQC TIN.py has default parameters for minimum coverage and sample size, and thus I implement both these parameters for direct comparisons, so that users can expect identical results as RSeQC. For transcripts, I download the latest GENCODE v46 file of basic gene annotations for the GRCh38 reference chromosomes, gencode.v46.basic.annotation.bed
from https://www.gencodegenes.org/human/release_46.html
I set minimum coverage to 2, and sample size to 100. RSeQC:
tin.py -i "${mod_bam}" -r "${bed_file}" -c 2 -n 100
Number of scores: 67069
Mean TIN: 67.089549182989
Median TIN: 74.25578864168884
Standard deviation of TIN: 26.001131242677577
LongReadSum:
longreadsum bam -i "${mod_bam}" -o "${output_dir}" -t 12 --genebed "${bed_file}" --min-coverage 2 --sample-size 100
Number of scores: 67069
Mean TIN: 67.0683
Median TIN: 74.25
Standard deviation of TIN: 26.0379
This PR will also address the help text error from issue #57
Updated results with high precision.
RSeQC:
tin.py -i "${mod_bam}" -r "${bed_file}" -c 2 -n 100
Number of scores: 67069
Mean TIN: 67.089549182989
Median TIN: 74.25578864168884
Standard deviation of TIN: 26.001131242677577
LongReadSum:
longreadsum bam -i "${mod_bam}" -o "${output_dir}" -t 12 --genebed "${bed_file}" --min-coverage 2 --sample-size 100
Number of scores: 67069
Mean TIN: 67.06832655372376
Median TIN: 74.24996965188242
Standard deviation of TIN: 26.03788585287367
RSeQC:
Nodes: 1
Cores per node: 8
CPU Utilized: 07:55:21
CPU Efficiency: 12.45% of 2-15:39:12 core-walltime
Job Wall-clock time: 07:57:24
Memory Utilized: 166.25 MB
Memory Efficiency: 0.32% of 50.00 GB
LongReadSum:
Nodes: 1
Cores per node: 8
CPU Utilized: 02:48:34
CPU Efficiency: 12.67% of 22:10:56 core-walltime
Job Wall-clock time: 02:46:22
Memory Utilized: 5.91 GB
Memory Efficiency: 11.83% of 50.00 GB
Add a unit test to complete this PR.
This PR adds a new feature for calculating TIN scores, yielding the scores and their summary statistics in TSV format, and adding this summary to the html report:
Add TIN values for RNA-Seq QC from BAM files, including unit tests.