Closed yarikoptic closed 1 year ago
are you to do anything to troubleshoot those 2 running processes @jwodder or I could just kill them? (they are just wasting CPU AFAIK ATM)
@yarikoptic You can kill them.
@yarikoptic How (if at all) do you want timeouts displayed in the README? (Cf. #4.)
@yarikoptic When I run MatNWB on the listed files directly (without going through FUSE), they both error out after about 20 seconds with Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'.
@yarikoptic Ping.
@yarikoptic How (if at all) do you want timeouts displayed in the README? (Cf. #4.)
Let's add one more column with timeouts.
@yarikoptic When I run MatNWB on the listed files directly (without going through FUSE), they both error out after about 20 seconds with
Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'.
and if on fuse'd filesystem -- does it timeout or crash? The point is that if it crashes -- it should have crashed in our healthcheck process too.
Filed https://github.com/NeurodataWithoutBorders/matnwb/issues/481 . complement with any extra information you see missing.
@yarikoptic Should the timeout column in the summary at the top include the IDs and number of assets for affected Dandisets, like is done for failures?
Also, if some assets of a Dandiset failed their healthchecks and other assets of that Dandiset timed out, should the Dandiset be listed under both "failed" and "timed out" in the summary?
and if on fuse'd filesystem -- does it timeout or crash?
It errors out as above, except it takes about a minute longer.
@yarikoptic Should the timeout column in the summary at the top include the IDs and number of assets for affected Dandisets, like is done for failures?
I think uniform presentation would be the easiest to code, so let's do exactly the same -- so with number of assets.
Also, if some assets of a Dandiset failed their healthchecks and other assets of that Dandiset timed out, should the Dandiset be listed under both "failed" and "timed out" in the summary?
sounds right.
and if on fuse'd filesystem -- does it timeout or crash? It errors out as above, except it takes about a minute longer.
hm, so it remains unknown why it was hanging (not crashing) when running within our healthcheck, correct?
I see
so we have two MATLAB jobs which should not take that long Here is details of invocation
let's add 1 hour timeout for any of healthcheck job -- it shouldn't take that long AFAIK. We should have separate report category (TIMEOUT) so we could alert developers that something is really not kosher.
please check if indeed these files somehow cause matnwb never "finish" and report an issue against matnwb,