GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
132 stars 22 forks source link

xpore dataprep empty data.json #86

Closed cathoderaymission closed 2 years ago

cathoderaymission commented 2 years ago

Using hek293 data from your paper I run the following command after running nanopolish as suggested from the docuentation and and receive an empty data.json file.

xpore dataprep \
--eventalign KO-rep1/eventalign.txt \
--gtf_path_or_url Homo_sapiens.GRCh38.91.gtf \
--transcript_fasta_paths_or_urls GRCh38.19.transcript.fa \
--out_dir KO-rep1/dataprep \
--genome --n_processes 32

The transcript fasta is made via:

gffread Homo_sapiens.GRCh38.91.gff3 -W -O -g Homo_sapiens.GRCh38.dna.primary_assembly.fa  -w GRCh38.transcript.fa
yuukiiwa commented 2 years ago

Hi cathoderaymission,

The transcript.fasta file should contain cDNA (and/or ncRNA) with transcript_ids that match those that are in the nanopolish/eventalign.txt. Creating transcript sequences from dna.primary_assembly and a corresponding gff3 file may not give you a compatible fasta file. As the gtf file you are using is from ENSEMBL release-91, which can also be found here: http://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz You can find the compatible version of the ENSEMBL transcript.fasta file here: http://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

ENSEMBL also provides transcript.fa and gtf reference sequence and annotation files for other organisms such as Saccharomyces cerevisiae: transcript.fa: http://ftp.ensembl.org/pub/release-91/fasta/saccharomyces_cerevisiae/cdna/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz gtf: http://ftp.ensembl.org/pub/release-91/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.91.gtf.gz

Thanks!

Best wishes, Yuk Kei