GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
131 stars 23 forks source link

diffmod output files empty #169

Closed anaesco closed 8 months ago

anaesco commented 1 year ago

Hi there,

I have successfully ran the data preparation from raw reads and the xpore preprocess steps, but once I ran xpore diffmod, the files are empty. This is the command line argument: $ xpore diffmod --config IGF2_SCRAM_cell_config_nofilt.yml Using the signal of unmodified RNA from /home/carter-balaj/miniconda3/lib/python3.9/site-packages/xpore/diffmod/model_kmer.csv 0 ids to be testing ...

Any guidance is greatly appreciated thanks!!

Attached are also the files from the preprocess. data.log

data.log

yuukiiwa commented 1 year ago

Hi @anaesco,

Your first data.log first use the genome reference instead of the transcriptome reference while your second data.log file uses the transcriptome reference which is correct. However, xpore dataprep cannot match to the reference as the first column of the eventalign.txt file may look like the following:

ENST00000416931.1|ENSG00000225972.1|OTTHUMG00000002338.1|OTTHUMT00000006720.1|MTND1P23-201|MTND1P23|372|unprocessed_pseudogene|

while the reference transcript id looks like the following:

ENST00000416931.1

One thing you will have to do is to modified the first column of the eventalign.txt file to match the reference (here is an example python script (not tested)):

import sys
fn=sys.argv[1]
file=open(fn,'r')
outfile=open('new'+fn,'w')
for ln in file:
 ln=ln.split('\t')
 first_col=ln[0].split('|')[0]
 newln=[first_col]+ln[1:]
 outfile.write('\t'.join(newln))
outfile.close()

Thanks!

Best wishes, Yuk Kei

anaesco commented 1 year ago

Hi there!

Thanks so much for responding. I figured it was the first column in the eventalign file since I was reading the closed issues section and someone had a similar issue.

The file is now edited and I was able to run diffmod! Thank you so much for your help.

Here is the output of the new file

contig position reference_kmer read_index strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx ENST00000416931.1 20 AAGGG 9 t 217 119.94 11.2150.01062 AAGGG 113.12 7.84 0.73 20790 20822 ENST00000416931.1 20 AAGGG 9 t 218 132.64 3.769 0.00232 AAGGG 113.12 7.84 2.09 20783 20790 ENST00000416931.1 21 AGGGA 9 t 219 106.53 2.761 0.00199 AGGGA 115.88 4.05 -1.94 20777 20783

anaesco commented 1 year ago

As a final question,

Is there any documentation as to which code was used to produce the results (data visualization)? Any guidance would be great! thank you!!

yuukiiwa commented 1 year ago

Hi @anaesco,

We have a tutorial that will be included in ONT's EPI2ME-LABS soon, which includes some visualization codes. It is attached below: https://drive.google.com/file/d/1xEddN-1mFXeKsaxqCmLt4q-lVvMhL_A4/view?usp=share_link

Thanks!

Best wishes, Yuk Kei