ComparativeGenomicsToolkit / Comparative-Annotation-Toolkit

Apache License 2.0
170 stars 48 forks source link

FilterTransMap fails if given empty transMap files #231

Open mhaukness-ucsc opened 3 years ago

mhaukness-ucsc commented 3 years ago

If transMap does not produce any projections, this error occurs in filter_transmap.py:

ERROR: 2021-01-10 21:04:02,374 - [pid 23388] Worker Worker(salt=987820952, workers=5, host=marina-cat-hprc-chry-all-6hxxq, username=root, pid=6624) failed Task: FilterTransMap for HG01123.pat Traceback (most recent call last): File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/luigi/worker.py", line 191, in run new_deps = self._run_get_new_deps() File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/luigi/worker.py", line 133, in _run_get_new_deps task_gen = self.task.run() File "/Comparative-Annotation-Toolkit/cat/init.py", line 1293, in run json_target) File "/Comparative-Annotation-Toolkit/cat/filter_transmap.py", line 207, in filter_transmap resolved_df = combined_tx_df.merge(merged_df, on='GeneId', how='left') File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/pandas/core/frame.py", line 7963, in merge validate=validate, File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 87, in merge validate=validate, File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 656, in init self._maybe_coerce_merge_keys() File "/Comparative-Annotation-Toolkit/cat_env/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1165, in _maybe_coerce_merge_keys raise ValueError(msg) ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat

The transMap files in the work directory for the genome in question are empty.

2021-01-10 13:30:26 0 HG01123.pat.gp 2021-01-10 13:30:26 0 HG01123.pat.psl

This was observed when trying to annotate human chrM genomes, as well as one chrY genome. The pipeline exits without producing any annotations (output directory is empty except for databases/*.db).

CAT should check for empty transMap files, and if files are empty, exit with an error message and output appropriate empty downstream files. CAT should then finish annotating other genomes. The user should be warned (in a noticeable way) that this error happened.