GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
131 stars 23 forks source link

diffmod.table and eventalign.index contain only header #172

Closed dietvin closed 1 year ago

dietvin commented 1 year ago

Hello,

when running diffmod on my data, I get an diffmod.table output that contains only the header. The diffmod.log and the dataprep data.log both show that the processes finished successfully.

While looking through the files I noticed that the eventalign.index only contains the header as well. All other files from the dataprep look as expected when compared to the test data.

I used the following commands: Nanopolish: nanopolish eventalign -t 32 --scale-events --signal-index --reads [FASTQ] --bam [BAM] --genome [FASTA] > [EVENTALIGN]

xpore dataprep: xpore dataprep --eventalign [EVENTALIGN] --out_dir [DATAPREP-DIR] --n_processes 32

xpore diffmod: xpore diffmod --config $config --n_processes 32

While running dataprep I got the warnings below, but no error message. _/.../anaconda/envs/xpore/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['linelength'].sum()

_/.../anaconda/envs/xpore/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['linelength'] = np.array(lines)

It would be great if you can help me out. Please let me know if you need any more information.

Thanks in advance Vincent

yuukiiwa commented 1 year ago

Hi Vincent (@dietvin),

Do you mind showing the top 10 lines of your eventalign.txt (by head eventalign.txt) file, please? Thanks!

Best wishes, Yuk Kei

dietvin commented 1 year ago

The eventalign.txt looks like this:

contig  position    reference_kmer  read_index  strand  event_index event_level_mean    event_stdv  event_length    model_kmer  model_mean  model_stdv  standardized_level  start_idx   end_idx
IRESeGFP5-complete  0   GGGCG   0   t   1   118.61  5.756   0.00465 GGGCG   108.23  5.31    1.70    26376   26390
IRESeGFP5-complete  0   GGGCG   0   t   2   106.47  6.813   0.00764 GGGCG   108.23  5.31    -0.29   26353   26376
IRESeGFP5-complete  1   GGCGA   0   t   3   106.97  9.815   0.02457 GGCGA   92.44   8.39    1.50    26279   26353
IRESeGFP5-complete  2   GCGAA   0   t   4   93.18   3.578   0.00896 GCGAA   92.59   3.99    0.13    26252   26279
IRESeGFP5-complete  3   CGAAT   0   t   5   119.27  5.203   0.00365 CGAAT   115.65  5.56    0.57    26241   26252
IRESeGFP5-complete  4   GAATT   0   t   6   114.70  1.150   0.00266 GAATT   112.11  3.11    0.73    26233   26241
IRESeGFP5-complete  4   GAATT   0   t   7   113.83  6.624   0.00664 GAATT   112.11  3.11    0.48    26213   26233
IRESeGFP5-complete  5   AATTG   0   t   8   102.80  5.963   0.00830 AATTG   100.78  5.53    0.32    26188   26213
IRESeGFP5-complete  6   ATTGG   0   t   9   84.47   1.255   0.00465 ATTGG   86.04   2.65    -0.52   26174   26188
yuukiiwa commented 1 year ago

Hi Vincent (@dietvin),

I ran xpore dataprep with the 10 lines provided above, which outputted the following in eventalign.index:

transcript_id,read_index,pos_start,pos_end
IRESeGFP5-complete,0,172,970

I do have a question. Is IRESeGFP5-complete the only contig in your eventalign.txt file?

Thanks!

Best wishes, Yuk Kei

dietvin commented 1 year ago

Seeing that it works for you, I just rechecked my workflow and realized that I had a mixup with the names of the eventalign files and dataprep folders. I fixed that and now it works fine.

I'm sorry for the work that it caused, but thank you very much for your help! I really appreciate the fast and helpful replies.

Best Vincent