Open winni2k opened 6 years ago
According to conda
I have version 1.3.0
installed:
$ conda list | grep rsem
rsem 1.3.0 boost1.64_3 bioconda
Version 1.2.28
runs without throwing an exception, and the output file is not empty.
The same happens to me when running:
rsem-gff3-to-gtf GCF_000001405.38_GRCh38.p12_genomic.gff human_refseq.gtf [...] Loaded 3500000 lines Loaded 3600000 lines Cannot recognize transcript rna3's parent rna2, a gene feature might be missing. [...] failed! Plase check if you provide correct parameters/options for the pipeline!
I tried using the alternative gffread but then rsem finds corrupted the output gtf when trying to create the reference genome:
The GTF file might be corrupted! Stop at line : NC_000001.11 BestRefSeq exon 17369 17391 . - . transcript_id "rna3"; Error Message: Cannot find gene_id!
I'm not sure this helps, but I have abandoned trying to convert GFFs to GTFs. Instead I now just use the files provided by ENSEMBL at ftp://ftp.ensembl.org/pub/release-95
(They come with versions!).
Thanks for your quick reply @winni2k ... I think you are right, it will be for the best to use directly ensembl genome and gtf. I am a little obfuscated with this issue since I wanted to use the latest ncbi Hs reference GRCh38.p12 (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/reference/GCF_000001405.38_GRCh38.p12
) though I guess there shouldn't be much different.
If it helps to someboy RSEM could successfully use the gtf outputed by gffread specifying the flag -C (coding only discard mRNAs that have no CDS features) when transforming the ncbi gff:
gffread GCF_000001405.38_GRCh38.p12_genomic.gff -E -T -C -o GCF_000001405.38_GRCh38.p12_genomic.gtf
Alas, the mismatching reference genome is an issue. I did end up switching to the Ensembl reference genome as well for my analyses :/
Now It seems that I have made possible to use ncbi gff and reference withoutproblems, It just needed to have the flag --gff3-RNA-patterns mRNA,rRNA
This was my command:
rsem-prepare-reference --gff3 _GCF_000001405.38_GRCh38.p12genomic.gff --gff3-RNA-patterns mRNA,rRNA --trusted-sources BestRefSeq,Curated\ Genomic --star --star-path _/SeqTools/STAR-2.6.0a/bin/Linux_x8664 _GCF_000001405.38_GRCh38.p12primassembly.fna ./human_refseq
I hope it suits you well
I would like some help understanding why
rsem-gff3-to-gtff
is failing.When I run
then I get the error message
I downloaded the GFF3 file from NCBI at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/Saccharomyces_cerevisiae/latest_assembly_versions/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz
Here is the header of that file:
When I run
Then I get this output: