BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

flair collapse creates invalid isoforms names that cause GTF conversion to fail #280

Closed diekhans closed 1 year ago

diekhans commented 1 year ago

FLAIR collapse creates isoform ids without the underscore separating the isoform id and the gene id, causing bed_to_gtf to fail. For example

d3380a06-a1bc-4689-87de-6299d205d24b/a34b3c2cdc097afcadab320f573a6cc04fefcddcGL000221.1:4000

Copy and paste the exact command you tried to run

${HOME}/bin/flair/flair.py collapse --query flair_all_corrected.bed -g $genome -f $annotation  --threads ${NSLOTS} -r /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210715/20210713_HS_BLaER1_PCRcDNA110_H0C2/20210713_HS_BLaER1_PCRcDNA110_H0C2/20210713_1407_MN24456_FAP82103_6d0f8d57/20210713_HS_BLaER1_PCRcDNA110_H0C2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210701/20210628_HS_BLaER1_PCRcDNA110_H0N2/20210628_HS_BLaER1_PCRcDNA110_H0N2/20210628_1623_MN24456_FAP84999_e7c35d5c/20210628_HS_BLaER1_PCRcDNA110_H0N2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210223/reads/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_HS_BLaER1_PCRcDNA110_H0T3/20210222_1723_MN26202_FAP51075_d4666859/20210222_HS_BLaER1_PCRcDNA110_H0T3.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210319/reads/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_HS_BLaER1_PCRcDNA110_H03C/20210316_1818_MN24456_FAP06181_4f7727ad/20210316_HS_BLaER1_PCRcDNA110_H03C.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210322/reads/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS/20210319_1412_MN24456_FAP06181_48d651f9/20210319_HS_BLaER1_PCRcDNA110_H03C_BIS.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210319/reads/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_HS_BLaER1_PCRcDNA110_H03N/20210316_1819_MN26202_FAP50933_42375c2c/20210316_HS_BLaER1_PCRcDNA110_H03N.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210322/reads/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2/20210319_1509_MN26202_FAP50933_8689e433/20210319_HS_BLaER1_PCRcDNA110_H03N_BIS2.guppy.v6.0.1-gpu.fastq.gz /nfs/users/project/gencode_006070/jlagarde/nanopore/sequencing_runs/runs/20210628/20210622_HS_BLaER1_PCRcDNA110_H0T2/20210622_HS_BLaER1_PCRcDNA110_H0T2/20210622_1202_MN26202_FAP46789_a0c99f09/20210622_HS_BLaER1_PCRcDNA110_H0T2.guppy.v6.0.1-gpu.fastq.gz

How did you install Flair? (We'd prefer it if you used one of the top two because they are the least likely to have package compatibility problems.)

  1. bioconda (e.g. conda create -n flair -c conda-forge -c bioconda flair)

What happened?

See attached error log

e.flair_module23FAST_91272191.log

What else do we need to know? This run created 93 invalid ids

bad.ids.txt

An example showing files containing one of the invalid ids

hits.txt