BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
203 stars 69 forks source link

flair 123 appears not to use splice junctions #271

Closed diekhans closed 11 months ago

diekhans commented 11 months ago

A collaborator attempted to run FLAIR with the 123 approach and splice junction support and got the exact same results as not using splice junctions.

Run it with the individual modules succeeded.

I have not had time to debug this, so it is possible there is human error. However, the cost of this problem is very high, so I am filing a ticket anyway.

Copy and paste the exact command you tried to run

#!/bin/bash
#$ -q rg-el7,long-sl7,short-sl7
#$ -N flair_module123
#$ -e /users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/logs/e.flair_module123FAST_$JOB_ID.log
#$ -o /users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/logs/o.flair_module123FAST_$JOB_ID.log
#$ -pe smp 12
#$ -l virtual_free=96G
#$ -l h_rt=72:00:00

set -x
module load Python/3.8.2-GCCcore-9.3.0
module load BEDTools/2.29.2-GCC-9.3.0
module load SAMtools/1.11-GCC-9.3.0

#flair 12346 -r reads.fa -g genome.fa -f annotation.gtf -o flair.output --temp_dir temp_flair [optional arguments]
#(module numbers: align=1, correct=2, collapse=3, collapse-range=3.5, quantify=4, diffExp=5, diffSplice=6)

genome="/nfs/users/rg/projects/references/Genome/H.sapiens/GRCh38/GRCh38.p13.primary_assembly.genome.fa"
minimap_path="/users/rg/scarbonell/bin/minimap2/"
annotation="/nfs/users/rg/projects/references/Annotation/H.sapiens/gencode43/gencode.v43.primary_assembly.annotation.gtf"
shortread_bed="/nfs/users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/all/data/junctions_from_sam_junctions.bed"

main_dir="/users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/all"
data_SRsupport="$main_dir/data_SRsupport"

mkdir -p $data_SRsupport
cd $data_SRsupport/

flair 123 -g $genome -f $annotation --shortread $shortread_bed --threads ${NSLOTS} -r /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210715/20210713_HS_BLaER1_PCRcDNA110_H0C2/20210713_HS_BLaER1_PCRcDNA110_H0C2/20210713_1407_MN24456_FAP82103_6d0f8d57/20210713_HS_BLaER1_PCRcDNA110_H0C2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210701/20210628_HS_BLaER1_PCRcDNA110_H0N2/20210628_HS_BLaER1_PCRcDNA110_H0N2/20210628_1623_MN24456_FAP84999_e7c35d5c/20210628_HS_BLaER1_PCRcDNA110_H0N2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210223/reads/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_1723_MN26202_FAP51075_d4666859/20210222_HS_BLaER1_PCRcDNA110_H0T3.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210319/reads/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_1818_MN24456_FAP06181_4f7727ad/20210316_HS_BLaER1_PCRcDNA110_H03C.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210322/reads/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_1412_MN24456_FAP06181_48d651f9/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210319/reads/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_1819_MN26202_FAP50933_42375c2c/20210316_HS_BLaER1_PCRcDNA110_H03N.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210322/reads/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_1509_MN26202_FAP50933_8689e433/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210628/20210622_HS_BLaER1_PCRcDNA110_H0T2/20210622_HS_BLaER1_PCRcDNA110_H0T2/20210622_1202_MN26202_FAP46789_a0c99f09/20210622_HS_BLaER1_PCRcDNA110_H0T2.guppy.v6.0.1-gpu.fastq.gz -m $minimap_path

How did you install Flair? bioconda

What happened?

identical results as running without --shortread $shortread_bed

What else do we need to know?

running it modules independently seems to produce correct, or at least different, results

#!/bin/bash -e
#$ -q rg-el7,long-sl7,short-sl7
#$ -N flair_module123
#$ -e /users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/logs/e.flair_module123FAST_$JOB_ID.log
#$ -o /users/project/gencode_006070_no_backup/scarbonell/TFM/long_reads/FLAIR/logs/o.flair_module123FAST_$JOB_ID.log
#$ -pe smp 12
#$ -l virtual_free=96G
#$ -l h_rt=72:00:00

set -x
module load Python/3.8.2-GCCcore-9.3.0
module load BEDTools/2.29.2-GCC-9.3.0
module load SAMtools/1.11-GCC-9.3.0

NSLOTS=64

top_dir="/hive/users/markd/projs/silvia/projs/flair/long+short-errors"
refs_dir="${top_dir}/refs"
genome="${refs_dir}/GRCh38.primary_assembly.genome.fa"
annotation="${refs_dir}/gencode.v43.primary_assembly.annotation.gtf"
shortread_bed="${refs_dir}/junctions_from_sam_junctions.bed"

run_dir=${top_dir}/sr-debug

../flair/flair.py correct --query ${run_dir}/flair.aligned.bed --genome $genome --gtf $annotation \
                  --shortread $shortread_bed --threads ${NSLOTS} --print_check

../flair/flair.py collapse --query ${run_dir}/flair_all_corrected.bed --genome $genome --gtf $annotation \
                  --reads ${run_dir}/flair.fastq.gz --threads ${NSLOTS}
diekhans commented 11 months ago

flair 123 was not actually the issue, as it happens with flair correct. Will file a different ticket when the problem is understood