Closed GemmaTuron closed 1 year ago
More casuistics that we can improve. When the input for the prediction contains BOTH Binary and regression values, ZairaChem fails with the following error:
Traceback (most recent call last):
File "/home/gturon/anaconda3/envs/zairachem/bin/zairachem", line 33, in <module>
sys.exit(load_entry_point('zairachem', 'console_scripts', 'zairachem')())
File "/home/gturon/anaconda3/envs/zairachem/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/gturon/anaconda3/envs/zairachem/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/gturon/anaconda3/envs/zairachem/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/gturon/anaconda3/envs/zairachem/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/gturon/anaconda3/envs/zairachem/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/cli/commands/predict.py", line 44, in predict
s.setup()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/prediction.py", line 158, in setup
self._normalize_input()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/prediction.py", line 83, in _normalize_input
f.process()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/files.py", line 363, in process
df = self.normalize_dataframe()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/files.py", line 310, in normalize_dataframe
resolved_columns = self.resolve_columns()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/schema.py", line 200, in resolve_columns
), "More than one values column found! {0}".format(values_column)
AssertionError: More than one values column found! ['exp', 'bin']
We should make clear in the docs the input format accepted
For a classification model trained passing binary data directly (no cut-off specified) At prediction time, the input must be either:
For a classification model trained passing regression data and a specified cutoff, at prediction time you can pass:
Of course passing the real results enables the evaluation of the outputs.
Thanks @GemmaTuron - the issue is now solved. If a bin
column is available, this is the preferred one.
Describe the bug When using a Zairachem Classification Model, if the prediction dataset contains regression values, it tries to use a regressor model and crashes. I think this only happens when you have trained the model without specifying a cutoff, but with binarized data already. zairaChem does not have any threshold it can use to convert the regression values to a classification. I need to confirm that bit though.
To Reproduce Steps to reproduce the behavior:
Expected behavior Zairachem ignores the real results column and does the predictions anyway. If it has the threshold, it can try to convert the regression values to a binary clf and use that for producing the performance reports
Desktop (please complete the following information):