In this case I ommit --genome even though it is written in the documentation to add as I believe this is only required when you have mapped to genome. I was unsure about this as both nanopolish and some documentation here says that direct RNA has to be mapped to transcriptome at the moment. What is the function of the --flag in this case? Is xpore also suitable for gDNA reads?
Running the xpore-dataprep produces the following error
Process Consumer-1:
Traceback (most recent call last):
File "/home/callum/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end_idx'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/local/pyenv/versions/3.7.2/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/callum/.local/lib/python3.7/site-packages/xpore/scripts/helper.py", line 110, in run
result = self.task_function(*next_task_args,self.locks)
File "/home/callum/.local/lib/python3.7/site-packages/xpore/scripts/dataprep.py", line 47, in combine
eventalign_result['length'] = pd.to_numeric(eventalign_result['end_idx'])-pd.to_numeric(eventalign_result['start_idx'])
File "/home/callum/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/callum/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'end_idx'
I also have to kill the process as it does not stop by itself.
I think this issue is related to this one https://github.com/tleonardi/nanocompore/issues/153 in this case they advised to added --samples flag when preparing the data with nanopolish so as to output all the necessary headers in the eventalign dataset.
I will try rerunning nanopolish eventalign with --sample fag.
I ran xpore on my own data running into issue.
nanopolish index -d /path/to/raw/fast5 /path/to/fastq
This created the index and index.readdb as well as the .fai etc files.
nanopolish eventalign --reads /path/to/fastq --bam /path/to/alignments --scale-events --summary output.txt --threads 12 > eventalign.txt
This creates the eventalign.txt and summary.txt as expected but with the following headers
At this point, I realised it is missing the
start_idx
andend_idx
headers that is in the demo data eventalign file.In this case I ommit --genome even though it is written in the documentation to add as I believe this is only required when you have mapped to genome. I was unsure about this as both nanopolish and some documentation here says that direct RNA has to be mapped to transcriptome at the moment. What is the function of the --flag in this case? Is xpore also suitable for gDNA reads?
Running the
xpore-dataprep
produces the following errorI also have to kill the process as it does not stop by itself.
I think this issue is related to this one https://github.com/tleonardi/nanocompore/issues/153 in this case they advised to added --samples flag when preparing the data with nanopolish so as to output all the necessary headers in the eventalign dataset.
I will try rerunning
nanopolish eventalign
with--sample fag
.