GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
132 stars 22 forks source link

xpore dataprep error addition #89

Closed Wardale24 closed 2 years ago

Wardale24 commented 2 years ago

After reading all open and closed issues, I see this is a common problem but want to add that I'm working with Danio rerio and I am unable to carry out xpore dataprep. Errors on separate occasions are as follows:

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (0) have mixed type s.Specify dtype option on import or set low_memory=False. options.func(options)

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (12) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (14) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/xpore.py:67: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False. options.func(options)

Command used:

xpore dataprep --eventalign zebrafish_eventalign.txt --gtf_path_or_url mapping/Danio_rerio.GRCz11.104.chr.gtf --transcript_fasta_paths_or_urls mapping/Danio_rerio.GRCz11.cdna.all.fa --out_dir dataprep_zf --genome

.json file is empty, .index and .log have only the header.

Both gtf and transcriptome files were downloaded from Ensembl. In addition, initial mapping with minimap2 was carried out on Danio rerio primary assembly genome from Ensembl.

From what I can gather from other issues, this is probably related to a lack of compatibility between the reference files. Nevertheless, I am unsure how to continue and would appreciate any assistance on this matter.

Please let me know if I can add any information.

Kind regards,

Alex

ploy-np commented 2 years ago

Hi @Wardale24,

I think the reason of this error might com from the minimap2 alignment. For xpore, you have to align Danio rerio on transcriptome, then rerun nanopolish and xpore-dataprep again.

Wardale24 commented 2 years ago

Hello @ploy-np

Thank you for the fast reply. I mapped the data once again, this time to the transcriptome, then ran nanopolish eventalign with --genome zebrafish.cdna.fa. Is this correct? I'm sorry for the basic question, I just find it strange to write a genome option and writing the transcriptome file. When trying nanopolish eventalign using --genome with primary assembly genome, it wouldn't recognize the scaffolds.

I then ran xpore-dataprep using the zebrafish gtf and cdna files. Now .json is 8GB.

Summary:

nanopolish eventalign --reads zf.fastq \ --bam zf.bam \ -- genome zf_cdna.fa \ --signal-index \ --scale-events \ --summary summary.txt \ --threads 32 > eventalign.txt

xpore dataprep \ --eventalign eventalign.txt \ --gtf_path_or_url zf.gtf \ --transcript_fasta_paths_or_urls zf_cdna.fa \ --out_dir dataprep \ --genome

Could you confirm this would be the right procedure?

ploy-np commented 2 years ago

Hi @Wardale24 ,

Yes, it sounds strange, I guess what the --genome in nanopolish means is the reference sequence. And what you did seems correct.