BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
203 stars 69 forks source link

/Error at the FLAIR collapse step (no data in output files) #259

Closed pothepanda09 closed 1 year ago

pothepanda09 commented 1 year ago

Hello,

I'm pretty new to coding, but I'm trying to run the flair -collapse step in the workflow. However, there is no actual output in the output file; it creates a blank file. I am using FLAIR in order to look at alternative splicing events from sequencing data generated from nanopore cDNA reads. The code I am currently running is: flair 123 -r fastq.gz -g /path/to/hg38.fa.gz -f /path/to/gencode.v43.annotation.gtf -o flair.output --trust_ends --temp_dir temp_flair

I get output files from the align and correct steps, and there are progress steps that show up after running the code:

Annotated ends extracted from GTF Read data extracted Single-exon genes grouped, collapsing Renaming isoforms using gtf [M::mm_idx_gen::0.0032.63] collected minimizers [M::mm_idx_gen::0.0042.11] sorted minimizers [M::main::0.0042.11] loaded/built the index for 0 target sequences. [M::mm_mapopt_update::0.0042.10] mid_occ = 100000 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0042.09] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing; -nan; total length: 0 [M::worker_pipeline::3.4861.32] mapped 121040 sequences [M::main] Version: 2.24-r1122 [M::main] CMD: minimap2 -a -t 4 -N 4 flair.output.firstpass.fa.combined.fastq.gz [M::main] Real time: 3.497 sec; CPU: 4.625 sec; Peak RSS: 0.231 GB Filtering isoforms by read coverage (END)

I noticed that it said that it built the index for 0 target sequences. Did I make an error here?

Thank you!

Jeltje commented 1 year ago

Good catch, something's not right with the genome file. Those log messages are from minimap, which is indexing your genome file before running. I checked, and it does in fact accept a gzipped file so that's not the problem here.

Minimap indexes the genome automatically if it cannot find the .mmi file, so all Flair does is call it with alignment input, including the genome file. You can create the index separately by running minimap2 -d genome.mmi genome.fa(.gz)

Have a look at your genome file (zcat hg38.fa.gz | head -c 200), does it look like a correctly formatted fasta file?

>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Jeltje commented 1 year ago

Please reopen this if you want to continue the discussion.