GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
134 stars 22 forks source link

Diffmod_error #142

Closed rania-o closed 2 years ago

rania-o commented 2 years ago

Hi,

I've already run the dataprep command and I got results. This is what my read count file contains :

idx,n_reads
dystro-oligo,998

I have more than 20 000 reads mapped to my reference, Is there a coverage filter or something that makes the number of reads drop to 998 ?

Also when I run the diff_mod with the correct dataprep results, I get this error message, and my terminal is frozen at this step and I have to control+C to exit Xpore and get back on it. :

xpore diffmod --config config.yml

1 ids to be testing ...
Process Consumer-1:
Traceback (most recent call last):
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'GAA'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/xpore-2.1-py3.6.egg/xpore/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/usr/local/lib/python3.6/dist-packages/xpore-2.1-py3.6.egg/xpore/scripts/diffmod.py", line 23, in execute
    kmer_signal = {'mean':model_kmer.loc[kmer,'model_mean'],'std':model_kmer.loc[kmer,'model_stdv']}
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 873, in __getitem__
    return self._getitem_tuple(key)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1044, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 786, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1059, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 3493, in xs
    loc = self.index.get_loc(key)
  File "/home/rania/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'GAA'

Thanks, Rania

yuukiiwa commented 2 years ago

Hi @rania-o,

Sorry for the delayed reply! The default --readcount_min is 1 for xpore dataprep. If the kmer and model_kmer columns of a site in the eventalign.txt from nanopolish do not match, xpore drops those sites, which may explain the drop from 20k to 998.

One potential reason why you get KeyError: 'GAA' from xpore may be that your eventalign.txt is truncated, and you can check it by tail eventalign.txt to see whether all columns of the last line is present.

Thanks!

Best wishes, Yuk Kei

rania-o commented 2 years ago

Hello @yuukiiwa

Thank you for your answer. I've checked my eventalign file and indeed the columns kmer and model kmer are absent for some lines. Here is an example :

contig position reference_kmer read_index strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx dystro-oligo 0 GCCAA 0 t 4 78.27 1.380 0.01328 GCCAA 73.26 2.11 1.75 28133 28173 dystro-oligo 1 CCAA 0 t 5 92.55 2.484 0.00465 CCAA 87.19 3.02 1.31 28119 28133 dystro-oligo 1 CCAA 0 t 6 95.85 1.139 0.00465 CCAA 87.19 3.02 2.12 28105 28119 dystro-oligo 2 CAA 0 t 7 98.24 2.083 0.00564 CAA 105.72 2.68 -2.06 28088 28105 dystro-oligo 2 CAA 0 t 8 96.94 3.260 0.00598 CAA 105.72 2.68 -2.42 28070 28088 dystro-oligo 3 AA 0 t 9 122.06 1.616 0.00299 AA 108.90 2.68 3.63 28061 28070 dystro-oligo 4 A 0 t 10 119.61 3.024 0.01295 A 108.90 2.68 2.95 28022 28061 dystro-oligo 4 A 0 t 11 101.06 3.389 0.00996 A 108.90 2.68 -2.16 27992 28022 dystro-oligo 5 0 t 12 98.53 1.871 0.00730 108.90 2.68 -2.86 27970 27992

How can I fix this please ? Rania

rania-o commented 2 years ago

Hi @yuukiiwa

It turns out my eventalign file was truncated as you said. I fixed it by converting U to T in the reference sequence, and it worked.

Thank you for your help.