dcjones / isolator

Rapid and robust analysis of RNA-Seq experiments.
MIT License
32 stars 7 forks source link

Couldn't read sequence #8

Open er0080808 opened 7 years ago

er0080808 commented 7 years ago

I do not know how to post a issue, so I paste the code here. Hope you can help me.

>isolator analyze -o xxx.hdf5 -g mm10.fa -p 4 RefSeq_Genes_mm10.gtf xxx.bam the bam file above is the output of STAR and has been sorted the following is the stdout

  _           _       _
 (_)___  ___ | | __ _| |_ ___  _ __
 | / __|/ _ \| |/ _` | __/ _ \| '__|
 | \__ \ (_) | | (_| | || (_) | |
 |_|___/\___/|_|\__,_|\__\___/|_|

 Version: 0.0.2-102-g24bafc0
 Instruction set: AVX
 [09:29:35] 3827 cassette exons
 [09:29:38] 436 retained introns
 [09:29:39] 224966 consensus exons
 [09:33:02] Too few paired-end reads to estimate fragment length distribution.
 [09:35:12] 3' bias: 1.698e-06
 [10:08:32] Couldn't read sequence GL456210.1

 Estimating fragment weights (xxx.bam):
 [==========================================================> ] 98.4%    1:31 ETA
>

the line "Couldn't read sequence GL456210.1" is marked with red color and the output xxx.hdf5 file is about 400 kB

So I don't know what should I do in next step...

dcjones commented 7 years ago

This error is due to a transcript in your gtf file being on the "GL456210.1" sequence, which is an unplaced contig in the m10 assembly, but that sequence not being present in mm10.fa.

It's a little strict about that, so you either have to use a more complete reference sequence, or filter out entries from the GTF file that are on such sequences.

er0080808 commented 7 years ago

Thank you so much for your reply.