BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

The results of my test_output are not consisitent with the oferred test_expected. Why? #288

Closed qingwangchen closed 9 months ago

qingwangchen commented 1 year ago

Hello, I tested FLAIR v1.7.0 using the fasta files in https://github.com/BrooksLabUCSC/flair/tree/master/test/test_input, but the quantitative results and the results of the differentially alternative splicing events I got are not exactly the same as the results in the https://github.com/BrooksLabUCSC/flair/tree/master/test/test_expected file, and I'd like to ask what this was caused by? @Jeltje

In detail: test.quantify.tpm.tsv

ID | A1 | A2 | A3 | B1 | B2 | B3 -- | -- | -- | -- | -- | -- | -- 2 | ENST00000225792.10_ENSG00000108654.15 | 14492.753623188406 | 17543.859649122805 | 0.0 | 8620.689655172413 | 18348.623853211007 | 0.0 3 | ENST00000256078.9_ENSG00000133703.12 | 0.0 | 0.0 | 0.0 | 17241.379310344826 | 18348.623853211007 | 0.0 4 | ENST00000311936.8_ENSG00000133703.12 | 72463.76811594203 | 52631.57894736842 | 37037.03703703704 | 103448.27586206897 | 55045.87155963302 | 37037.03703703704 5 | ENST00000348547.6_ENSG00000125991.19 | 0.0 | 35087.71929824561 | 12345.679012345678 | 0.0 | 0.0 | 12345.679012345678 6 | ENST00000451605.1_ENSG00000125991.19 | 144927.53623188406 | 263157.8947368421 | 86419.75308641975 | 163793.10344827586 | 183486.2385321101 | 86419.75308641975 7 | ENST00000557334.5_ENSG00000133703.12 | 43478.260869565216 | 0.0 | 24691.358024691355 | 25862.068965517243 | 64220.18348623853 | 24691.358024691355 8 | HISEQ:1287:HKCG7BCX3:1:1101:14093:55880_ENSG00000133703.12 | 0.0 | 0.0 | 0.0 | 0.0 | 18348.623853211007 | 0.0 9 | HISEQ:1287:HKCG7BCX3:1:1101:1676:64370_chr12:25206000 | 57971.014492753624 | 52631.57894736842 | 37037.03703703704 | 25862.068965517243 | 9174.311926605504 | 37037.03703703704 10 | HISEQ:1287:HKCG7BCX3:1:1101:17977:51378_ENSG00000108654.15 | 14492.753623188406 | 52631.57894736842 | 12345.679012345678 | 34482.75862068965 | 27522.93577981651 | 12345.679012345678 11 | HISEQ:1287:HKCG7BCX3:1:1101:9136:21962_chr12:25205000 | 43478.260869565216 | 35087.71929824561 | 24691.358024691355 | 25862.068965517243 | 27522.93577981651 | 24691.358024691355 12 | HISEQ:1287:HKCG7BCX3:1:1101:9578:94516_ENSG00000108654.15 | 0.0 | 0.0 | 24691.358024691355 | 8620.689655172413 | 9174.311926605504 | 24691.358024691355 13 | HISEQ:1287:HKCG7BCX3:1:1102:12733:84633_ENSG00000133703.12 | 0.0 | 0.0 | 0.0 | 0.0 | 27522.93577981651 | 0.0 14 | HISEQ:1287:HKCG7BCX3:1:1102:1600:74434_chr12:25207000 | 72463.76811594203 | 122807.01754385965 | 185185.18518518517 | 181034.4827586207 | 110091.74311926604 | 185185.18518518517 15 | HISEQ:1287:HKCG7BCX3:1:1104:13255:38089_ENSG00000125991.19 | 14492.753623188406 | 0.0 | 12345.679012345678 | 25862.068965517243 | 9174.311926605504 | 12345.679012345678 16 | HISEQ:1287:HKCG7BCX3:1:1104:2444:51920_chr12:25205000 | 43478.260869565216 | 35087.71929824561 | 49382.71604938271 | 25862.068965517243 | 9174.311926605504 | 49382.71604938271 17 | HISEQ:1287:HKCG7BCX3:1:1105:10547:49865_ENSG00000108654.15 | 0.0 | 0.0 | 0.0 | 8620.689655172413 | 0.0 | 0.0 18 | HISEQ:1287:HKCG7BCX3:1:1105:13763:58932_chr12:25233000 | 14492.753623188406 | 0.0 | 0.0 | 8620.689655172413 | 9174.311926605504 | 0.0 19 | HISEQ:1287:HKCG7BCX3:1:1106:1294:43972_chr12:25234000 | 0.0 | 0.0 | 0.0 | 8620.689655172413 | 9174.311926605504 | 0.0 20 | HISEQ:1287:HKCG7BCX3:1:1106:14150:54273_chr12:25206000 | 0.0 | 17543.859649122805 | 37037.03703703704 | 8620.689655172413 | 27522.93577981651 | 37037.03703703704 21 | HISEQ:1287:HKCG7BCX3:1:1106:14873:50359_ENSG00000108654.15 | 28985.507246376812 | 52631.57894736842 | 74074.07407407407 | 17241.379310344826 | 27522.93577981651 | 74074.07407407407 22 | HISEQ:1287:HKCG7BCX3:1:1107:13669:8305_ENSG00000108654.15 | 86956.52173913043 | 70175.43859649122 | 49382.71604938271 | 25862.068965517243 | 36697.247706422015 | 49382.71604938271 23 | HISEQ:1287:HKCG7BCX3:1:1107:19821:7487_ENSG00000108654.15 | 14492.753623188406 | 17543.859649122805 | 49382.71604938271 | 25862.068965517243 | 9174.311926605504 | 49382.71604938271 24 | HISEQ:1287:HKCG7BCX3:1:1107:7610:49233_chr12:25206000 | 57971.014492753624 | 17543.859649122805 | 12345.679012345678 | 43103.448275862065 | 36697.247706422015 | 12345.679012345678 25 | HISEQ:1287:HKCG7BCX3:1:1108:10984:29952_ENSG00000108654.15 | 0.0 | 0.0 | 24691.358024691355 | 17241.379310344826 | 0.0 | 24691.358024691355 26 | HISEQ:1287:HKCG7BCX3:1:1110:2082:85316_chr12:25205000 | 14492.753623188406 | 35087.71929824561 | 12345.679012345678 | 8620.689655172413 | 36697.247706422015 | 12345.679012345678 27 | HISEQ:1287:HKCG7BCX3:1:1111:11268:57313_ENSG00000108654.15 | 43478.260869565216 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 28 | HISEQ:1287:HKCG7BCX3:1:1112:11090:59856_chr12:25207000 | 14492.753623188406 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 29 | HISEQ:1287:HKCG7BCX3:1:1112:20526:86503_chr12:25234000 | 14492.753623188406 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 30 | HISEQ:1287:HKCG7BCX3:1:1112:7385:16672_ENSG00000108654.15 | 43478.260869565216 | 0.0 | 61728.39506172839 | 8620.689655172413 | 18348.623853211007 | 61728.39506172839 31 | HISEQ:1287:HKCG7BCX3:1:1112:9533:55818_chr12:25205000 | 28985.507246376812 | 17543.859649122805 | 61728.39506172839 | 25862.068965517243 | 36697.247706422015 | 61728.39506172839 32 | HISEQ:1287:HKCG7BCX3:1:1113:18513:98562_chr12:25234000 | 0.0 | 0.0 | 0.0 | 25862.068965517243 | 18348.623853211007 | 0.0 33 | HISEQ:1287:HKCG7BCX3:1:1113:19423:71141_chr12:25206000 | 43478.260869565216 | 17543.859649122805 | 37037.03703703704 | 8620.689655172413 | 64220.18348623853 | 37037.03703703704 34 | HISEQ:1287:HKCG7BCX3:1:1113:9124:44339_chr12:25205000 | 28985.507246376812 | 17543.859649122805 | 12345.679012345678 | 17241.379310344826 | 9174.311926605504 | 12345.679012345678 35 | HISEQ:1287:HKCG7BCX3:1:1114:18197:37779_chr12:25207000 | 43478.260869565216 | 70175.43859649122 | 61728.39506172839 | 94827.58620689655 | 73394.49541284403 | 61728.39506172839

My output: [ABs6.flair.minimap2.sorted.only_primary.quantify.tpm.txt] (https://github.com/BrooksLabUCSC/flair/files/12492266/ABs6.flair.minimap2.sorted.only_primary.quantify.tpm.txt)

test.diffsplice.alt3.events.quant.tsv is also not consisitent with my result. diffsplice.alt3.events.quant.txt

I am running the software with the following parameters


# minimap2_test.pbs
# this script should be run in the origin dir.

#!/bin/sh
set -eo pipefail

source /public/software/profile.d/apps_singularity-3.5.2.sh  
imgDIR=/public/group_share_data/PGx/qwachen/images

cd /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna

date
echo "minimap begins"

start_time=$(date +%s)

#final command 
## minimap2
singularity exec $imgDIR/flair_latest.sif \
                 minimap2 -t 8 \
                 -ax splice -uf -k14 \
                 --junc-bed /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/test.align.bed \
                 /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/reference/genome/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
                 ${fasta} \
                 -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.all.sam

## samtools
##  sam2bam
singularity exec $imgDIR/flair_latest.sif \
                 samtools view -hSb \
                 /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.all.sam > /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.all.bam

## 
singularity exec $imgDIR/flair_latest.sif \
                 samtools view -q 40 -F 2304 \
                 -hb /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.all.bam > /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.only_primary.bam

## 
singularity exec $imgDIR/flair_latest.sif \
                 samtools sort /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.aligned.only_primary.bam \
                 -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.sorted.only_primary.bam

singularity exec $imgDIR/flair_latest.sif \
                 samtools index /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/${sample_id}.minimap2.sorted.only_primary.bam

end_time=$(date +%s)
execution_time=$((end_time - start_time))

date
echo "minimap2 ends"

echo "Time:$execution_time s"

**bam_merge_ABs6.pbs**
```# bam_merge_ABs6.pbs
# this script should be run in the origin dir.

#!/bin/sh
set -eo pipefail

source /public/software/profile.d/apps_singularity-3.5.2.sh  
imgDIR=/public/group_share_data/PGx/qwachen/images

cd /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna

date
echo "samtools begins"

start_time=$(date +%s)

#final command 
singularity exec $imgDIR/flair_latest.sif \
                 samtools merge /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/ABs6.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/A1.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/A2.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/A3.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/B1.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/B2.minimap2.sorted.only_primary.bam \
                                /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/B3.minimap2.sorted.only_primary.bam

singularity exec $imgDIR/flair_latest.sif \
                 samtools index /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/ABs6.minimap2.sorted.only_primary.bam

## bam2bed 
singularity exec $imgDIR/flair_latest.sif \
                 bam2Bed12 -i /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/minimap2/ABs6.minimap2.sorted.only_primary.bam > /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/ABs6.minimap2.sorted.only_primary.bed12

end_time=$(date +%s)
execution_time=$((end_time - start_time))

date
echo "samtools ends"

echo "Time:$execution_time s"

```# flair_merge_ABs6.pbs
# this script should be run in the origin dir.

#!/bin/sh
set -eo pipefail

source /public/software/profile.d/apps_singularity-3.5.2.sh  
imgDIR=/public/group_share_data/PGx/qwachen/images

cd /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna

date
echo "flair_single begins"

start_time=$(date +%s)

#final command singularity
## flair correct
singularity exec $imgDIR/flair_latest.sif \
                 flair correct -q /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/ABs6.minimap2.sorted.only_primary.bed12 \
                               -f /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/annotation.gtf \
                               -t 8 \
                               -g /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/reference/genome/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
                               -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/correct/ABs6.flair.minimap2.sorted.only_primary

## flair collapse
singularity exec $imgDIR/flair_latest.sif \
                 flair collapse -g /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/reference/genome/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
                                -q /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/correct/ABs6.flair.minimap2.sorted.only_primary_all_corrected.bed \
                                -r /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.1.fa,/public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.2.fa,/public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.3.fa,/public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.4.fa,/public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.5.fa,/public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/reads.6.fa \
                                -f /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/data/test/test_input/annotation.gtf \
                                -t 8 \
                                -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/collapse/ABs6.flair.minimap2.sorted.only_primary.collapse

## flair quantify 
singularity exec $imgDIR/flair_latest.sif \
                 flair quantify -r /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/collapse/reads_manifest-ABs6.tsv \
                                -i /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/collapse/ABs6.flair.minimap2.sorted.only_primary.collapse.isoforms.fa \
                                --tpm \
                                -t 8 \
                                -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/quantify/ABs6.flair.minimap2.sorted.only_primary.quantify

## flair diffSplice 
singularity exec $imgDIR/flair_latest.sif \
                 flair diffSplice -i /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/collapse/ABs6.flair.minimap2.sorted.only_primary.collapse.isoforms.bed \
                 -q /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/quantify/ABs6.flair.minimap2.sorted.only_primary.quantify.counts.tsv \
                 -t 8 \
                 -o /public/group_share_data/PGx/qwachen/HCM/AS/ont_pcr_cdna/flair/diffSplice/diffsplice_ABs6/

end_time=$(date +%s)
execution_time=$((end_time - start_time))

date
echo "flair_single ends"

echo "Time:$execution_time s"
Jeltje commented 9 months ago

You are testing version 1.7.0 with the test files of 2.0.0, this is generally not a good idea because outputs may change between versions. However, I have been trying to repeat this experiment and I do get the same results with both test inputs.

The most probable reason that you see differences between your output and the test output is that the test output was generated with the --check_splice --stringent flags, which are not in your command. When I remove them I do indeed get output similar to yours.

If this does not answer your question, please reopen this ticket.