Open sjanssen2 opened 1 year ago
same seems to be the case for dates
same issue might affect with the Bokeh server?!
(qiime2-2022.8) t490s x86_64 /media/jlu/vol/jlab/MicrobiomeAnalyses/Projects/Pandyra_LCMV>bokeh serve --show app
2023-01-09 17:21:02,339 Starting Bokeh server version 2.4.3 (running on Tornado 6.2)
2023-01-09 17:21:02,342 User authentication hooks NOT provided (default user enabled)
2023-01-09 17:21:02,349 Bokeh app running at: http://localhost:5006/app
2023-01-09 17:21:02,350 Starting Bokeh server with process id: 25929
/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/data_handler.py:72: UserWarning: Some categories have been dropped because they had either only one level or too many. Use the max_levels_per_category argument to modify this threshold.
Dropped columns: ['birth_timestamp', 'host_age', 'infection', 'mouse_number']
warn(
2023-01-09 17:21:04,054 Error running application handler <bokeh.application.handlers.directory.DirectoryHandler object at 0x7fdca1155dc0>: 'infection'
File 'base.py', line 3631, in get_loc:
raise KeyError(key) from err Traceback (most recent call last):
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'infection'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/bokeh/application/handlers/code_runner.py", line 231, in run
exec(self._code, module.__dict__)
File "/media/jlu/vol/jlab/MicrobiomeAnalyses/Projects/Pandyra_LCMV/app/main.py", line 48, in <module>
effect_size_by_category(dh, binary_cols)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/effect_size.py", line 49, in effect_size_by_category
results = Parallel(n_jobs=n_jobs, **parallel_args)(
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/data_handler.py", line 114, in calculate_effect_size
if self.metadata[column].dtype != np.dtype("object"):
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: 'infection'
2023-01-09 17:21:04,487 WebSocket connection opened
2023-01-09 17:21:04,487 ServerConnection created
^C
Interrupted, shutting down
Thanks for bringing this up. I'm not sure how to handle dates but for booleans I think we can just allow bool
dtype columns.
@sjanssen2 Can you try out this change and see if it resolves your boolean issue?
Assume I have a metadata category like
infection
with valuesTRUE
orFALSE
. If I load these data as in your examplemetadata = pd.read_table("data/metadata.tsv", sep="\t", index_col=0)
they are of typeobject
and proper boolean values, i.e.True
andFalse
. If I would add adtype=str
, the values are still of typeobject
but strings, namely'TRUE'
and'FALSE'
.Only the
dtype=str
way works for me. Otherwise evident throws the error:You might want to return a more explicit error message in those cases.