Closed pothepanda09 closed 1 year ago
Good catch, something's not right with the genome file. Those log messages are from minimap, which is indexing your genome file before running. I checked, and it does in fact accept a gzipped file so that's not the problem here.
Minimap indexes the genome automatically if it cannot find the .mmi file, so all Flair does is call it with alignment input, including the genome file. You can create the index separately by running minimap2 -d genome.mmi genome.fa(.gz)
Have a look at your genome file (zcat hg38.fa.gz | head -c 200
), does it look like a correctly formatted fasta file?
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Please reopen this if you want to continue the discussion.
Hello,
I'm pretty new to coding, but I'm trying to run the flair -collapse step in the workflow. However, there is no actual output in the output file; it creates a blank file. I am using FLAIR in order to look at alternative splicing events from sequencing data generated from nanopore cDNA reads. The code I am currently running is: flair 123 -r fastq.gz -g /path/to/hg38.fa.gz -f /path/to/gencode.v43.annotation.gtf -o flair.output --trust_ends --temp_dir temp_flair
I get output files from the align and correct steps, and there are progress steps that show up after running the code:
Annotated ends extracted from GTF Read data extracted Single-exon genes grouped, collapsing Renaming isoforms using gtf [M::mm_idx_gen::0.0032.63] collected minimizers [M::mm_idx_gen::0.0042.11] sorted minimizers [M::main::0.0042.11] loaded/built the index for 0 target sequences. [M::mm_mapopt_update::0.0042.10] mid_occ = 100000 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0042.09] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing; -nan; total length: 0 [M::worker_pipeline::3.4861.32] mapped 121040 sequences [M::main] Version: 2.24-r1122 [M::main] CMD: minimap2 -a -t 4 -N 4 flair.output.firstpass.fa.combined.fastq.gz [M::main] Real time: 3.497 sec; CPU: 4.625 sec; Peak RSS: 0.231 GB Filtering isoforms by read coverage (END)
I noticed that it said that it built the index for 0 target sequences. Did I make an error here?
Thank you!