GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
104 stars 19 forks source link

m6Anet dataprep error #103

Closed rania-o closed 1 year ago

rania-o commented 1 year ago

Hi, I tried do use m6anet dataprep but I got this error :

Traceback (most recent call last):
  File "/home/rania/miniconda3/envs/m6anet/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'read_index'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rania/miniconda3/envs/m6anet/bin/m6anet-dataprep", line 8, in <module>
    sys.exit(main())
  File "/home/rania/miniconda3/envs/m6anet/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 352, in main
    parallel_index(eventalign_filepath,chunk_size,out_dir,n_processes)
  File "/home/rania.ouazahrou/miniconda3/envs/m6anet/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 132, in parallel_index
    chunk_complete = chunk[chunk['read_index'] != chunk.iloc[-1]['read_index']]
  File "/home/rania/miniconda3/envs/m6anet/lib/python3.8/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/rania/miniconda3/envs/m6anet/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'read_index'

Here is the command I used for nanopolish :

nanopolish eventalign -t 8 --reads non_mod_rep1_extract_nanopolish_sed.fastq --bam mapping/non_mod_rep1_extract_nanopolish_sorted.bam --genome /data2/ref/ref.fasta --scale-events --signal-index --samples --print-read-names > non_mod_eventalign.txt

and this is the head of my eventalign file :

contig  position    reference_kmer  read_name   strand  event_index event_level_mean    event_stdv  event_length    model_kmer  model_mean  model_stdv  standardized_level  start_idx   end_idx samples
cc6m2244t7ecorv 46  CGATG   b6bad146-a8f5-4179-9564-0ce5829ecf2a    t   39  105.14  9.179   0.00498 CGATG   114.13  5.10    -1.53   122885  122900  105.612,90.1133,124.119,92.8217,110.577,116.295,122.464,96.8844,103.806,100.797,102.151,99.7433,103.505,106.213,102
cc6m2244t7ecorv 47  GATGA   b6bad146-a8f5-4179-9564-0ce5829ecf2a    t   40  75.88   3.617   0.01195 GATGA   79.44   3.87    -0.80   122849  122885  78.8281,77.4739,77.173,74.615,81.5366,76.4206,75.0664,78.5272,74.7655,72.358,75.2169,68.1448,72.5084,79.2795,74.615,75.5178,76.872,72.8094,74.1636,77.173,74.1636,78.9786,74.1636,75.2169,73.5617,74.3141,80.3328,74.0131,71.7561,77.4739,72.358,75.3673,75.6683,73.4112,91.4675,76.2701

Do you have any idea what this is about and how to fix this error? Thanks, Rania

chrishendra93 commented 1 year ago

hi @rania-o , seems like your eventalign file does not have the read index column. I have never tried running nanopolish with the --samples and --print-read-names options but seems like with my version of nanopolish (0.13.3), you don't need those options, as per m6anet documentation do not require those options. Could it be that the --samples option results in the read index not being saved?

rania-o commented 1 year ago

Hi @chrishendra93

Thanks for your reply. I used the other options because I'm running other tools that use nanopolish, so I merged all the options required in one command to avoid having several eventalign files.

I tried to run it again only with the options that you recommend, and it worked. But I have a question about the output. In your doc you mentioned two files (read level and site level), but in my case I got only one file (data.result.csv).

Here is the command I used : m6anet-run_inference --input_dir dataprep/ --out_dir m6anet_mod_results --n_processes 30 --num_iterations 1000

Rania

rania-o commented 1 year ago

I figured it out while scrolling the other issues. I had to install the latest version. Thanks.