SciLifeLab / NGI-RNAseq

Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
https://ngisweden.scilifelab.se/
MIT License
51 stars 42 forks source link

no exon lines in the GTF file #229

Closed Zeinab5 closed 6 years ago

Zeinab5 commented 6 years ago

Hello,

I got the GTF file from CGD: http://www.candidagenome.org/download/gff/C_albicans_SC5314/Assembly22/ But I got this error message:

Fatal INPUT FILE error, no exon lines in the GTF file: albicans.gtf
Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column.
      Make sure the GTF file is unzipped.
      If exons are marked with a different word, use --sjdbGTFfeatureExon

Is it possible to make STAR accept this GTF file format? or how can I modify this GTF file to be accepted by STAR?

Thanks, Zeinab Hefny

ewels commented 6 years ago

Hi @Zeinab5,

This is a little strange, http://www.candidagenome.org/download/gff/ suggests that the GTF files should use the term exon.

Exactly which GTF file did you use from your link? There are many in that folder.

Phil

Zeinab5 commented 6 years ago

This is the one that I used http://www.candidagenome.org/download/gff/C_albicans_SC5314/Assembly22/C_albicans_SC5314_A22_current_features.gtf

varemo commented 6 years ago

The corresponding GFF file (http://www.candidagenome.org/download/gff/C_albicans_SC5314/Assembly22/C_albicans_SC5314_A22_current_features.gff) seems to contain the required fields, as oposed to the GTF which only contains "CDS". Is it possible to supply a GFF file directly to the pipeline?

ewels commented 6 years ago

I think it should be fine to run with a GFF instead of GTF. The format is pretty similar right? Certainly easy enough to try... Just use --gtf [path]

Zeinab5 commented 6 years ago

Hi, I solved the issue by converting the GFF file from the CGD to a GTF file by using gffread and it worked. I couldn't use the gff file directly with the pipline because gtfToGenePred can't convert gff to GenePred which I need to get the Bed12 file. it also didn't work when I didn't gave the path of the Bed12 to the pipline.

ewels commented 6 years ago

I solved the issue by converting to the GFF file from the CGD to a GTF file by using gffread and it worked.

Ok great! You mean this tool right? Maybe we can bundle that with the pipeline and add an additional -gff option...

It also didn't work when I didn't gave the path of the Bed12 to the pipeline

You mean something didn't work with --bed12? What was the error?

Zeinab5 commented 6 years ago

Yes, this is the tool that I used.

./gffread my.gff3 -T -o my.gtf

I thought that was the reason, attached the error message screen shot 2018-04-20 at 15 24 42

ewels commented 6 years ago

Hmm, Non existent bind point usually means that you don't have overlays enabled in singularity. This is to do with your singularity installation - if you have an old version of the linux kernel then it may be impossible. If this is the case, then the singularity container we provide won't work for you.

The rest of the error is not so clear. But you don't get any errors now, running with this GTF file?

Zeinab5 commented 6 years ago

Yes, It's working fine withe this GTF file now

ewels commented 6 years ago

Ok great - then maybe the bind point thing was a false alarm. Strange, I'll give it another test here to be sure.

I've created a new issue for GFF support, but over at nf-core/RNAseq (we are porting our pipelines to this new nf-core organisation - see http://nf-co.re/). Issue here: https://github.com/nf-core/RNAseq/issues/15

Zeinab5 commented 6 years ago

ok, Nice. thanks for the links