BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
201 stars 69 forks source link

ValueError in filter_isoforms_by_proportion_of_gene_expr.py #338

Open byee4 opened 1 month ago

byee4 commented 1 month ago

Copy and paste the exact command you tried to run flair collapse -g /home/bay001/annotations/mm39/GRCm39.primary_assembly.genome.fa -q output/aligned/synapse10M_all_corrected.bed -r sample69.fastq.gz sample70.fastq.gz sample71.fastq.gz sample72.fastq.gz sample2_69.fastq.gz sample2_70.fastq.gz sample2_71.fastq.gz sample2_72.fastq.gz --output output_10M.0 -f /home/bay001/annotations/mm39/gencode_vM35/gencode.vM35.annotation.gtf --support 0 --threads 24 --check_splice --generate_map --annotation_reliant generate --stringent; How did you install Flair? (We'd prefer it if you used one of the top two because they are the least likely to have package compatibility problems.)

  1. bioconda (e.g. conda create -n flair -c conda-forge -c bioconda flair)

conda create -n flair-2.0.0 python=3.8 conda install -c bioconda -c conda-forge flair minimap2 samtools bedtools pipfreeze.txt

What happened? Program errored out after ~4 hours when running flair collapse

Annotated ends extracted from GTF
Read data extracted
Single-exon genes grouped, collapsing
Traceback (most recent call last):
  File "/home/bay001/miniconda_tscc2/envs/flair-2.0.0/lib/python3.8/site-packages/flair/filter_isoforms_by_proportion_of_gene_expr.py", line 61, in <module>
    if float(iso[0][-1])/gene_total >= s:
ValueError: could not convert string to float: 'X'
Writing temporary files to /tmp/tmpjivpz4da/
Making transcript fasta using annotated gtf and genome sequence
Aligning reads to reference transcripts
Counting supporting reads for annotated transcripts
Setting up unassigned reads for flair-collapse novel isoform detection
Renaming isoforms using gtf
Traceback (most recent call last):
  File "/home/bay001/miniconda_tscc2/envs/flair-2.0.0/bin/flair", line 10, in <module>
    sys.exit(main())
  File "/home/bay001/miniconda_tscc2/envs/flair-2.0.0/lib/python3.8/site-packages/flair/flair.py", line 1035, in main
    status = collapse()
  File "/home/bay001/miniconda_tscc2/envs/flair-2.0.0/lib/python3.8/site-packages/flair/flair.py", line 600, in collapse
    subprocess.check_call([sys.executabl
[pipfreeze.txt](https://github.com/BrooksLabUCSC/flair/files/15492487/pipfreeze.txt)
e, path+'filter_isoforms_by_proportion_of_gene_expr.py',
  File "/home/bay001/miniconda_tscc2/envs/flair-2.0.0/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/bay001/miniconda_tscc2/envs/flair-2.0.0/bin/python', '/home/bay001/miniconda_tscc2/envs/flair-2.0.0/lib/python3.8/site-packages/flair/filter_isoforms_by_proportion_of_gene_expr.py', 'output_10M.0.firstpass.bed', '0.0', 'output_10M.0.firstpass.bed']' returned non-zero exit status 1.

What else do we need to know? Go ahead, every bit helps.

I'm trying to run FLAIR on some rather large pacbio datasets (~10M reads each sample, 8 samples) with short-read support. Running on an HPC, I am giving it 32G memory and 24 threads. I do see a number of very large SAM files, but I don't believe I'm running out of space just yet. This command had worked previously on a smaller run (~4M reads each sample).