Improve logging when user makes input mistake in scmpx Fractal task

nrepina commented 1 year ago

Improve logging when user inputs non-existing Label or Wavelength ID in "scMultiplex Measurements" Fractal task. Currently gives no error during task run but no feature extraction output is produced.

Logs from job run where this issue occured are attached.

29_logs.zip

jluethi commented 1 year ago

Thanks for posting this with the logs @nrepina I just quickly looked through them and I guess the issue here is more that those logs are somewhat hidden if the user just looks at the web interface.

Specifically, the 5 (scmultiplex step) .err files contain the following log:

2023-07-19 18:56:36,570; WARNING; Channel not found, exit from the task.
Original error: ChannelNotFoundError: No channel found in [OmeroChannel(wavelength_id='A04_C01', index=None, label='DAPI', window=Window(min=0, max=65535, start=120, end=7000), color='0000FF', active=True, coefficient=1, inverted=False), OmeroChannel(wavelength_id='A03_C02', index=None, label='GFP', window=Window(min=0, max=65535, start=120, end=4000), color='00FF00', active=True, coefficient=1, inverted=False), OmeroChannel(wavelength_id='A02_C03', index=None, label='AGR2', window=Window(min=0, max=65535, start=0, end=4000), color='FF0000', active=True, coefficient=1, inverted=False), OmeroChannel(wavelength_id='A01_C04', index=None, label='BCATPH', window=Window(min=0, max=65535, start=0, end=4000), color='FFFFFF', active=True, coefficient=1, inverted=False)] for label='A04_C01'
2023-07-19 18:56:36,570; INFO; END scmultiplex_feature_measurements task

Thus, what happened is that the tasks realizes there is no fitting channel and then exits, without crashing the whole pipeline.

For me, there are 2 questions:

Is this silent exit (with the not so visible log) when incorrect channels are specified the desired behavior for scMultipleX measurements?
If we want the task to just exit (without failing) in that case, can we make the warnings more present?

Why do we want the silent exit at all? It allows scenarios where e.g. you run on a multiplexed experiment, want to make measurements for your channel A03_C02, but some cycles don't have that cycle. If we allow regular exit in that case, we just don't get measurements for those cycles, but get measurements for cycles that contain A03_C02.

But it highlights that, if we allow such things, that there is a category of errors (the warnings) which we may need to highlight more to the user. Not fully sure yet what the best way of exposing them would be. We could think of including warnings in the workflow log that we show the user in the web interface directly, but that could get very overwhelming (e.g. for 5 wells, you get 50 warnings, 1 per well). Maybe a separate logs containing warnings could be relevant?

We should refer to this issue when improving how we show logs to the users.

Until we have that improved, it will be important to check the .err log files of some of your tasks (as in the logs linked above) if something did not result the way you expected.

nrepina commented 1 year ago

Thank you @jluethi for the thorough response!

Ok got it, I also suspected they might be buried inside one the files, but there are so many that with my little experience it is cumbersome to go through them. Given there are 484 log files just for one plate I have no idea where to look.

If the channel does not exist, this would be a significant warning that the user should be made aware of, so to me it makes sense that the warning is made more visible. I think it can be a warning, and not an error that exists the task - this way the task can continue running for the channels that do exist (as in the multiplexing scenario you described). Is it possible for warnings to appear on a plate-level (i.e. this channel does not exist in the plate) so that you don't get the dozens of repeated warnings piling up on the web interface?

On the topic of warnings - some of my wells have out of focus wells due to focus error, as specified in the Yoko metadata. When I ran the same plates through Drogon I was warned about it this, which is also very useful to know and not obvious otherwise. Can I find similar information in the Fractal logs, and where would it be?

jluethi commented 1 year ago

they might be buried inside one the files, but there are so many that with my little experience it is cumbersome to go through them. Given there are 484 log files just for one plate I have no idea where to look.

Fully agreed! The goal should be to just expose what is relevant. It's sometimes hard to know what that will be and Fractal doesn't expose all of that as of now. Until it does, here's a brief overview of the log folder contents (because they are not as complicated as it seems): Everything happens per task, i.e. for task 0, 1, 2, 3 etc. For task 0, there are: 0_slurm_9711316.err => slurm errors, ignore in most cases 0_slurm_9711316.out => slurm outputs, ignore in most cases 0_slurm_submit.sbatch => sbatch script, ignore in most cases 0.args.json => contains parameters that were submitted, only check if you need to verify the parameters 0.err => Contains the relevant logs! <= Check this! 0.metadiff.json => update to the metadata, ignore in most cases 0.out => always empty I think, ignore

=> Just check the .err files (and not the 0slurm*.err files). One per task would be useful.

For each well in parallel tasks, there will be a separate set of those 7 files. But they mostly contain the same, unless one well had an issue.

On the topic of warnings - some of my wells have out of focus wells due to focus error, as specified in the Yoko metadata. When I ran the same plates through Drogon I was warned about it this, which is also very useful to know and not obvious otherwise. Can I find similar information in the Fractal logs, and where would it be?

The log of the Create OME-Zarr should contain the logs for this. We kept some logs about how many entries were skipped because of errors (and actually improved the counting from how Drogon counts to avoid off-by-2 errors). I think they should show up as warnings in that log, but let me know if you run an actual plate where there are these issues and whether the log shows up as a warning

nrepina commented 1 year ago

Amazing, thanks so much for the detailed explanation @jluethi ! Very helpful, and will check it out.

fmi-basel / gliberal-scMultipleX

Improve logging when user makes input mistake in scmpx Fractal task #84