DiamondLightSource / durin

BSD 3-Clause "New" or "Revised" License
2 stars 5 forks source link

Problems with Durin plugin from XDS #30

Open JunAishima opened 10 months ago

JunAishima commented 10 months ago

We are having some issues with XDS using Durin where the plugin appears to have problems with memory allocation. This doesn't happen with all datasets. If it will be helpful, I can get permission to share the dataset offline.

malloc(): unaligned tcache chunk detected free(): invalid pointer Aborted (core dumped)

JunAishima commented 10 months ago

Graeme suggested offline that I could try to check the last release - I downloaded the old plugin and tried, and got different issues. Hopefully this helps though.

Old plugin: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source
xds_par 0000000000561823 Unknown Unknown Unknown libpthread-2.28.s 00007F806560ECF0 Unknown Unknown Unknown libc-2.28.so 00007F80650C6E85 _IO_default_xsput Unknown Unknown libc-2.28.so 00007F8065099AEB _IO_vfprintf Unknown Unknown libc-2.28.so 00007F80650BBB15 vsprintf Unknown Unknown libc-2.28.so 00007F80650A2078 sprintf Unknown Unknown durin-plugin.so 00007F80646F8922 get_dectris_eiger Unknown Unknown durin-plugin.so 00007F80646F7BCA create_dataset_de Unknown Unknown durin-plugin.so 00007F80646F99F6 get_detector_info Unknown Unknown durin-plugin.so 00007F80646F7321 plugin_open Unknown Unknown xds_par 0000000000417307 generic_data_plug 301 generic_data_plugin.f90 xds_par 0000000000430F2F genericgetfrm 25757 MAIN_XDS.f90 xdspar 000000000042F0B3 getfrm 25859 MAIN_XDS.f90 xdspar 000000000042BC06 parser 13513 MAIN_XDS.f90 xdspar 000000000041AC1B xds 21547 MAIN_XDS.f90 xds_par 0000000000418661 MAIN 1 MAIN_XDS.f90 xds_par 0000000000415822 Unknown Unknown Unknown libc-2.28.so 00007F806506DD85 libc_start_main Unknown Unknown xds_par 0000000000415729 Unknown Unknown Unknown

New plugin: free(): invalid pointer Aborted (core dumped)

graeme-winter commented 10 months ago

Thanks for that @JunAishima - I presume any data set should trigger the problem even blanks? A test example would really help for debugging. The stack trace you sent is from XDS not Durin

c-mita commented 10 months ago

I would wager the error is coming from inside durin:

libc-2.28.so 00007F80650A2078 sprintf Unknown Unknown
durin-plugin.so 00007F80646F8922 get_dectris_eiger Unknown Unknown
durin-plugin.so 00007F80646F7BCA create_dataset_de Unknown Unknown
durin-plugin.so 00007F80646F99F6 get_detector_info Unknown Unknown
durin-plugin.so 00007F80646F7321 plugin_open Unknown Unknown

It's hard to tell which function exactly; the names appeared to be shortened and there are several starting with get_dectris_eiger (and they all use sprintf).

Can you output the symbol table for durin-plugin.so (something like objdump -TC durin-plugin.so) and identify which function exactly is causing the problem? (may need to work out the offset the plugin was loaded at?)

I would guess this is just a case of sprintf just running past the end of an input buffer. I thought I was more careful than that, but...

I think the code should just be updated to use snprintf. I think when I wrote this I was unnecessarily targeting a C89...

graeme-winter commented 10 months ago

Thank you @c-mita - given example data this would be much easier to debug

That said - @JunAishima - if you could attach output of h5ls -rv that would be maybe helpful, so can look for a long string / likely overrun

JunAishima commented 10 months ago

h5ls_out_5111.txt

What is the best way to send a dataset to you? I got the ok for that

graeme-winter commented 10 months ago

@JunAishima thank you I see the data arriving as we discussed out of band. The h5ls gave me no clues: I will now see if I can reproduce your problems.

graeme-winter commented 10 months ago

OK, thank you for sharing the data

I note that

/entry/data/data_000001  External Link {NYX_B006_5111_data_000001.h5//entry/data/data}
/entry/data/data_000002  External Link {NYX_B006_5111_data_000002.h5//entry/data/data}
/entry/data/data_000003  External Link {NYX_B006_5111_data_000003.h5//entry/data/data}
/entry/data/data_000004  External Link {NYX_B006_5111_data_000004.h5//entry/data/data}

There are 4 entries here but you only sent 3 data files. The 3rd data set contains only a few images but is properly formatted

Grey-Area Downloads :) $ h5ls -r NYX_B006_5111_data_000003.h5 
/                        Group
/entry                   Group
/entry/data              Group
/entry/data/data         Dataset {8/Inf, 3262, 3108}

I will test in a second (downloading the big data files now) but I suspect that this inconsistency is what is causing durin some problems. We may need to figure out a workaround.

If you only process the first 2000 images in XDS does everything work?

graeme-winter commented 10 months ago

I get a segmentation fault under macOS as well, so we have a real problem here. Now investigating to work out precisely what the problem is. I suspect just processing 2,000 frames won't help

graeme-winter commented 10 months ago

Error for me (surprisingly) at

err = H5Dread(ds_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, buffer);

reading the pixel_mask

graeme-winter commented 10 months ago
(gdb) p data_desc->dims[1] 
$2 = 0
(gdb) p data_desc->dims[2] 
$3 = 0

OK, so we have malloc'd no bytes? That won't end well

But this would trigger an error when one comes to actually write some data in

JunAishima commented 10 months ago

This is an Eiger2 - I see what appear to be the dimensions of the image in the HDF file at /entry/instrument/detector/module/data_size. Another set of fields that should contain the dimensions are /entry/instrument/detector/detectorSpecific/x_pixels_in_detector (also y_pixels...)

c-mita commented 10 months ago

I am surprised that malloc(0) does not give an error

malloc(0) returns a pointer that should not dereferenced (usually NULL).

The missing external file might be causing the problem.

When calculating the dataset dimensions for a Dectris dataset, it walks over all numbered datasets in /entry/data in order, counting the length of the overall dataset (the bounds set by XDS don't matter at this point). During this walk, it overwrites the x-y dimensions from previous datasets with whatever the last one it read was.

I would expect H5DOpen2 to fail when trying to open a dataset, but maybe not? Maybe it just gets an empty descriptor? And hence the last thing written to the x and y dimensions is 0?

JunAishima commented 10 months ago

aah, I see that now. it looks like we have four data files defined (probably defined during the collection), and only 3 data files written out. Not sure how they precisely define the experiment, but it would help if it was being defined correctly. This could still be derailed by aborting a collection (not sure if they did that here).

The plugin should be able to handle this case properly as well, so sounds like there could be improvements on both sides.

graeme-winter commented 10 months ago

Working on this properly now, I think it is a little more subtle than perhaps it at first looks. Short version - this is an unhandled error from

  if (retval < 0) {
    free(frame_counts);
  } else {
    memcpy(desc->dims, dims, 3 * sizeof(*dims));
    desc->data_width = data_width;
    eiger_desc->n_data_blocks = n_datas;
    eiger_desc->block_sizes = frame_counts;
  }
  return retval;

where the return status is not checked. I think if I patch it to not do the free but to reshape this we can make it work as is, which is inelegant but more useful (probably) than actually error'ing

graeme-winter commented 10 months ago

@JunAishima after some hacking I have something which appears to (at least partially) work - the indexing fails but that is probably a geometry error - would you like to have a go?

durin-plugin.zip

Currently working on the fix-30 branch, will need to do some additional tidying as there is an unhandled route which I think we probably want to handle (but maybe no?)

graeme-winter commented 6 months ago

@JunAishima sorry for the slow pick-up - but did you get a chance to test? I was tidying up unmerged branches