GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
132 stars 22 forks source link

run dataprep error #37

Closed q1134269149 closed 3 years ago

q1134269149 commented 3 years ago

Hi, when I run xpore-dataprep with the command line: _>pyensembl install --release 99 --species homo_sapiens

xpore-dataprep --eventalign HEK_eventalign_reads_xpore.txt --summary HEK_eventalign_summary.txt --out_dir HEK_dataprep --ensembl 99 --species homosapiens --genome However, I got the error in the log: _nohup: ignoring input 2020-10-30 11:35:08,135 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=99, species='homo_sapiens') 2020-10-30 11:35:08,914 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.cdna.all.fa.gz.pickle 2020-10-30 11:35:09,119 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.ncrna.fa.gz.pickle 2020-10-30 11:35:09,282 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.pep.all.fa.gz.pickle Process Consumer-1: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'end_idx'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(*next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 50, in combine eventalign_result['length'] = pd.to_numeric(eventalign_result['end_idx'])-pd.to_numeric(eventalign_result['start_idx']) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/pandas/core/frame.py", line 2902, in getitem indexer = self.columns.get_loc(key) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 'endidx'

And the version of xpore is 0.5.6. Surprisedly, although I got an error in the log file, the program was still running and there was no output of the result. What can I do? Thanks hqin

ploy-np commented 3 years ago

Hi @q1134269149,

The error occurs because there is no 'end_idx' in the output from nanopolish eventalign. So, you need to rerun nanopolish eventalign with --signal-index.

q1134269149 commented 3 years ago

I re-run nanopolish and start xpore-dataprep, but I got some errors in the log: _nohup: ignoring input 2020-10-31 15:19:05,559 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=99, species='homo_sapiens') 2020-10-31 15:19:06,199 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.cdna.all.fa.gz.pickle 2020-10-31 15:19:06,330 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.ncrna.fa.gz.pickle 2020-10-31 15:19:06,458 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.pep.all.fa.gz.pickle INFO:pyensembl.sequence_data:Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.cdna.all.fa.gz.pickle INFO:pyensembl.sequence_data:Loaded sequence dictionary from /home/shihan/.cache/pyensembl/GRCh38/ensembl99/Homo_sapiens.GRCh38.ncrna.fa.gz.pickle Process Consumer-29: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000442171', 548) Process Consumer-22: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000409020', 1680) Process Consumer-16: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000333421', 2614) Process Consumer-26: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000523976', 1325) Process Consumer-21: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000361733', 3529) Process Consumer-30: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000392550', 3510) Process Consumer-28: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000527078', 1790) Process Consumer-17: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000279147', 2478) Process Consumer-18: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000368219', 2310) Process Consumer-24: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000256474', 3738) Process Consumer-20: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000322030', 2915) Process Consumer-19: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000420613', 2155) Process Consumer-27: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000540737', 1787) Process Consumer-25: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomic_position','kmer' KeyError: ('ENST00000395418', 825) Process Consumer-23: Traceback (most recent call last): File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/helper.py", line 110, in run result = self.task_function(next_task_args,self.locks) File "/home/shihan/anaconda3/envs/nanopolish/lib/python3.6/site-packages/xpore-0.5.6-py3.6.egg/xpore/scripts/dataprep.py", line 317, in preprocess_gene genomic_coordinate = list(itemgetter(zip(tx_ids,tx_positions))(t2g_mapping)) # genomic_coordinates -- np structured array of 'chr','gene_id','genomicposition','kmer' KeyError: ('ENST00000547026', 1824)

In addition, in the output file, I got six files: data.index, data.json, data.log, data.readcount, eventalign.hdf5, eventalign.log

And the tail of eventalign.log file are: 30f80d2d-d599-4d85-aa14-a19f3d50b929 fe2e9cb9-d733-4754-bc25-6994b87b42a3 e0dfb0bf-6329-45aa-a470-4739201bd487 a39a48e9-a266-4b51-8dfa-c9ef0427dcd9 bf7cee54-b558-4fd8-aad5-6c9c64c98f00 69235923-8073-4a16-98be-7821be6754e7 d5f7f1b4-b372-4bbd-b3c6-34adb4297ca4 9578dbd7-6699-4960-a187-4dcdff60af23 e8a029cb-e317-4897-ac88-896d77c9dcc7 --- SUCCESSFULLY FINISHED ---

May I ask if this will allow us to continue the next step of xpore-diffmod? Thanks hqin

q1134269149 commented 3 years ago

Moreover, when the xpore appears errors, I find that it does not automatically stop and exit the program, but manually kill all task, otherwise it will remain stuck in the program task. I wonder if this can be improved in subsequent versions. Thanks hqin