Closed cathoderaymission closed 2 years ago
Hi cathoderaymission,
The transcript.fasta
file should contain cDNA (and/or ncRNA) with transcript_ids that match those that are in the nanopolish/eventalign.txt. Creating transcript sequences from dna.primary_assembly
and a corresponding gff3
file may not give you a compatible fasta file.
As the gtf
file you are using is from ENSEMBL release-91, which can also be found here:
http://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz
You can find the compatible version of the ENSEMBL transcript.fasta
file here:
http://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
ENSEMBL also provides transcript.fa
and gtf
reference sequence and annotation files for other organisms such as Saccharomyces cerevisiae:
transcript.fa
: http://ftp.ensembl.org/pub/release-91/fasta/saccharomyces_cerevisiae/cdna/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz
gtf
: http://ftp.ensembl.org/pub/release-91/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.91.gtf.gz
Thanks!
Best wishes, Yuk Kei
Using hek293 data from your paper I run the following command after running nanopolish as suggested from the docuentation and and receive an empty data.json file.
The transcript fasta is made via: