Closed Wardale24 closed 2 years ago
Hi @Wardale24,
I think the reason of this error might com from the minimap2 alignment. For xpore, you have to align Danio rerio on transcriptome, then rerun nanopolish and xpore-dataprep again.
Hello @ploy-np
Thank you for the fast reply. I mapped the data once again, this time to the transcriptome, then ran nanopolish eventalign with --genome zebrafish.cdna.fa. Is this correct? I'm sorry for the basic question, I just find it strange to write a genome option and writing the transcriptome file. When trying nanopolish eventalign using --genome with primary assembly genome, it wouldn't recognize the scaffolds.
I then ran xpore-dataprep using the zebrafish gtf and cdna files. Now .json is 8GB.
Summary:
nanopolish eventalign --reads zf.fastq \ --bam zf.bam \ -- genome zf_cdna.fa \ --signal-index \ --scale-events \ --summary summary.txt \ --threads 32 > eventalign.txt
xpore dataprep \ --eventalign eventalign.txt \ --gtf_path_or_url zf.gtf \ --transcript_fasta_paths_or_urls zf_cdna.fa \ --out_dir dataprep \ --genome
Could you confirm this would be the right procedure?
Hi @Wardale24 ,
Yes, it sounds strange, I guess what the --genome in nanopolish means is the reference sequence. And what you did seems correct.
After reading all open and closed issues, I see this is a common problem but want to add that I'm working with Danio rerio and I am unable to carry out xpore dataprep. Errors on separate occasions are as follows:
/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (0) have mixed type s.Specify dtype option on import or set low_memory=False. options.func(options)
/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (12) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)
/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (14) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)
/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)
Command used:
xpore dataprep --eventalign zebrafish_eventalign.txt --gtf_path_or_url mapping/Danio_rerio.GRCz11.104.chr.gtf --transcript_fasta_paths_or_urls mapping/Danio_rerio.GRCz11.cdna.all.fa --out_dir dataprep_zf --genome
.json file is empty, .index and .log have only the header.
Both gtf and transcriptome files were downloaded from Ensembl. In addition, initial mapping with minimap2 was carried out on Danio rerio primary assembly genome from Ensembl.
From what I can gather from other issues, this is probably related to a lack of compatibility between the reference files. Nevertheless, I am unsure how to continue and would appreciate any assistance on this matter.
Please let me know if I can add any information.
Kind regards,
Alex