GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
134 stars 22 forks source link

error in xpore diffmod #219

Open lyj95618 opened 4 months ago

lyj95618 commented 4 months ago

Hi,

I got the following error when I ran xpore diffmod

Loading python/3.9.13
  Loading requirement: gcc/7.2.0 readline/8.1 curl/7.74.0 libxml2/2.9.1
    pcre/8.44.utf8 libpng/1.2.59 sqlite/3.35.3 geos/3.4.2 libtiff/4.0.9
    proj/7.2.0 tcltk/8.6.11 CpG-tools/1.1.0
Using the signal of unmodified RNA from /hpf/largeprojects/ccmbio/yliang/long_read_RNA/nanopore_brian/python_venv/lib/python3.9/site-packages/xpore/diffmod/model_kmer.csv
Process Consumer-11:
Traceback (most recent call last):
  File "/hpf/largeprojects/ccmbio/yliang/long_read_RNA/nanopore_brian/python_venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'GCTATGCTC'

This is my yaml for input

data:
    MIA:
        rep1: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/8327-M2/dataprep
        rep2: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/8327-M3/dataprep
        rep3: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/Sample1/dataprep
    PBS:
        rep1: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/4147-M1/dataprep
        rep2: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/4147-M2/dataprep
        rep3: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/Sample2/dataprep

out: /hpf/largeprojects/ccmbio/acelik_files/kalish/nanopore/nanopore/lauren_test/debug_nanopolish/new_xpore/diffmod_output

sample1 and sample2 were from R10 flowcell and the rest were from R9 flowcell. Since nanopolish doesn't support R10 data, I used f5c, which supports R10 and R9 and does the same thing as nanopolish, to process all the data. I noticed there are some differences in the eventalign.txt output. In the eventalign output file, the two samples from R10 have the 9 k-mer and the rest R9 data has 5 k-mer.

R10 data eventalign output:

contig  position    reference_kmer  read_index  strand  event_index event_level_mean    event_stdv  event_length    model_kmer  model_mean  model_stdv  standardized_level  start_idx   end_idx
ENSMUST00000103679.2    4   GATAAGGAT   0   t   995 102.35  3.626   0.00350 GATAAGGAT   97.12   3.70    1.23    49450   49464
ENSMUST00000103679.2    5   ATAAGGATT   0   t   996 116.55  6.388   0.00350 ATAAGGATT   111.40  3.22    1.40    49436   49450

R9 data:

contig  position    reference_kmer  read_index  strand  event_index event_level_mean    event_stdv  event_length    model_kmer  model_mean  model_stdv  standardized_level  start_idx   end_idx
ENSMUST00000181768.2    21  AGGTG   0   t   1   108.57  6.003   0.00400 AGGTG   117.25  3.37    -2.28   126162  126174

Would this be the issue of why xpore outputs this error?

Thanks for the help! Laur

yuukiiwa commented 3 months ago

Hi Laur (tagging you here @lyj95618),

xpore only support 5mer comparison for now, so 9mer doesn't work.

Thanks!

Best wishes, Yuk Kei

lyj95618 commented 3 months ago

Thank you for your reply and the suggestion in another thread about changing the 9mer to 5mer!

I have one more question about the xPore comparison analysis. Since I am combining data from R9 flowcell and R10(rna004) flowcell, is there a way xPore can adjust for the potential batch effect?

My comparison condition:

Sample A (R9), Sample B (R9), Sample1 (rna004) Vs Sample D (R9), Sample E (R9), Sample2 (rna004)

Thanks, Laur