GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
103 stars 19 forks source link

Dataprep. PerformanceWarning: indexing past lexsort depth may impact performance #39

Closed Stakaitis closed 2 years ago

Stakaitis commented 2 years ago

Question: How to avoid or silence this warning? Issue: Slow m6anet-dataprep process Error 1 which occured 6181 times:

/home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:101: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum()

Error 2 which occured 1 time:

/home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:143: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Command: m6anet-dataprep --eventalign ${EVENTALIGN} --out_dir ${DATAPREP_DIR} --n_processes ${CPUS} --readcount_min 2 --min_segment_count 2 File size in ${EVENTALIGN}: 1) 565G; run6_U87.eventalign.txt

Additional info: m6anet version: 1.1.0 from pypi _nanopolishversion: 0.13.2 from bioconda OS: CentOS Linux 7 (Core) Experiment: the whole flowcell was used for 1 sample (dRNA-seq run), which generated 216 .fast5 files

chrishendra93 commented 2 years ago

Thanks for raising this issue. I believe the dataframe created is not sorted index-wise. A fix for this will be to add

eventalign_result.sort_index()

as for Error 2, it is just a warning from pandas that you can ignore, that will not affect the performance although I admit it gets a bit annoying. I'll try to push some fixes to the two issues on the next minor release

Thanks

Regards

Christopher Hendra