OpenChemistry / distiller

BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

HAADF metadata not extracted #1168

Open ercius opened 3 months ago

ercius commented 3 months ago

The metadata is not being extracted for some scans. The preview image is showing up though.

I tried restarting faust-scan-file and the problem came back. The logs show this error:

[2024-08-21 11:28:46,455] [7] [ERROR] [^----Agent*: scan_fi[.]watch_for_haadf_events]: Crashed reason=IndexError('index 0 is out of bounds for axis 0 with size 0') 
2024-08-21T18:28:46.457525499Z Traceback (most recent call last):
2024-08-21T18:28:46.457528509Z   File "/usr/local/lib/python3.9/site-packages/faust/agents/agent.py", line 676, in _execute_actor
2024-08-21T18:28:46.457530979Z     await coro
2024-08-21T18:28:46.457533569Z   File "/app/scan_file_worker.py", line 409, in watch_for_haadf_events
2024-08-21T18:28:46.457536789Z     image_path = await generate_image(tmp, path, f"{scan_id}.{format}")
2024-08-21T18:28:46.457539459Z   File "/app/scan_file_worker.py", line 97, in generate_image
2024-08-21T18:28:46.457544339Z     return await generate_image_from_data(tmp_dir, path, image_filename)
2024-08-21T18:28:46.457547459Z   File "/app/scan_file_worker.py", line 61, in generate_image_from_data
2024-08-21T18:28:46.457549769Z     file = dm.dmReader(data_path, on_memory=False)
2024-08-21T18:28:46.457552099Z   File "/usr/local/lib/python3.9/site-packages/ncempy/io/dm.py", line 1237, in dmReader
2024-08-21T18:28:46.457554460Z     with fileDM(filename, verbose, on_memory=on_memory) as f1:
2024-08-21T18:28:46.457556840Z   File "/usr/local/lib/python3.9/site-packages/ncempy/io/dm.py", line 163, in __init__
    if not self._validDM():
2024-08-21T18:28:46.457561320Z   File "/usr/local/lib/python3.9/site-packages/ncempy/io/dm.py", line 343, in _validDM
2024-08-21T18:28:46.457563590Z     self._dmType = self.fromfile(self.fid, dtype=np.dtype('>u4'), count=1)[0]
2024-08-21T18:28:46.457566430Z IndexError: index 0 is out of bounds for axis 0 with size 0
2024-08-21T18:28:46.458206626Z [2024-08-21 11:28:46,457] [7] [INFO] [^----OneForOneSupervisor: (1@0x7f051b0db1c0)]: Restarting dead <Agent*: scan_fi[.]watch_for_haadf_events>! Last crash reason: IndexError('index 0 is out of bounds for axis 0 with size 0') 
2024-08-21T18:28:46.458224586Z NoneType: None

The files look ok on vfdaq and on NERSC. Im also able to read the file at ncemhub/distiller/dm4/2024.08.21/scan599.dm4 using ncempy. Maybe the file was not fully written before the metadata extraction was attempted?

cjh1 commented 3 months ago

Looking at the code I can't see how the metadata extraction would start before, the file is fully written. The metadata extraction is triggered by the the file being uploaded, maybe an incomplete dm4 gets uploaded, that looks like it could be possible. Is scan599.dm4 the file associated with the stacktrace ?

ercius commented 3 months ago

Yes. Scan599 was the one that triggered the error. I think it also seems like the file was not fully written. The file was readable when I checked it though.