Virtual environment for several runs in one go

Calpas commented 5 years ago

Dear experts, I'm using the make_cxi.py script to make a virtual environment for one run. I did not see an option if I want to do it for all the runs at the same time, a command like that (*). I could make a shell script but this option would be useful. Please let me know if I missed it. Regard

() make_cxi.py /path/ID-83/ -o id83_allrun.cxi

takluyver commented 5 years ago

You haven't missed anything - it processes one run at a time. Feel free to write a script that does it for multiple runs.

Calpas commented 5 years ago

I did that. When I run over sample ID-80 (from the cxidb), for several runs I have this error message (1) while it runs fine for ID-83. Regards

(1) /beegfs/desy/group/it/ReferenceData/cxidb/ID-80 INFO:hdf5_virtualise.euxfel_vds:Reading run /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 ... INFO:hdf5_virtualise.euxfel_vds:Making virtual dataset for 1192 trains (1486155619:1486156811) INFO:hdf5_virtualise.euxfel_vds:60 frames per train INFO:hdf5_virtualise.euxfel_vds:Pixels in one detector module: (512, 128) INFO:hdf5_virtualise.euxfel_vds:VDS shape: (71520, 16, 512, 128) Traceback (most recent call last): File "/gpfs/exfel/sw/software/hdf5-virtualise/make_cxi.py", line 11, in sys.exit(main()) File "/gpfs/exfel/sw/software/hdf5-virtualise/hdf5_virtualise/make_cxi.py", line 86, in main write_combined_file(run_dir, out_file) File "/gpfs/exfel/sw/software/hdf5-virtualise/hdf5_virtualise/make_cxi.py", line 21, in write_combined_file combined = combine_detector_data(run_dir, n_modules=16) File "/gpfs/exfel/sw/software/hdf5-virtualise/hdf5_virtualise/euxfel_vds.py", line 217, in combine_detector_data pulse_ids[pulses_slice, chunk.module_no] = chunk_pids ValueError: could not broadcast input array from shape (7500) into shape (15000)

takluyver commented 5 years ago

Try this:

karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80

If that reports errors, then there's a problem with the way the data has been written.

Calpas commented 5 years ago

I got this output:

-bash-4.2$ karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80 Checking run directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80 Validation failed!

No usable files found directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80

takluyver commented 5 years ago

Sorry, my mistake. You need to run it on the run directory you're using, like this:

karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066

Calpas commented 5 years ago

I got:

-bash-4.2$ karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 Checking run directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 Validation failed!

Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/trainId file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5

Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/mask file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5

Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/pulseId file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5

...

takluyver commented 5 years ago

Yup, that's the problem that's preventing it from generating a virtual dataset. We don't want to try to work around this kind of file corruption, because it destroys the link between the data and which train it belongs to.

Calpas commented 5 years ago

So only one sample ID is usable in the cxidb?!

Calpas commented 5 years ago

When I'm running crystfel to find peaks, I have far less then 5% efficiency for the image to be "indexable" (for lcls, I had about 25%). Is that extremely low efficient expected? Is there a littetatur on this to know what is expected? Regards

takluyver commented 5 years ago

For analysing from the processed data, that seems to be the case, unfortunately. The CFEL team that did the original analysis worked from the raw data, which probably doesn't have this issue. This is an issue that was introduced by an old version of the calibration process, before it got fixed.

The detector group at XFEL can reprocess the raw data to generate new processed files, but they don't go in the normal proposal folder, and as far as I know we can't update the data on CXIDB.

You may want to check the CrystFEL results with Tom White - I'm not an expert on crystallography. But I think low rates are expected for this data.

Calpas commented 5 years ago

I guess (hope) that maybe this low eff if wrong. Indeed if we have much much more data but in the end the efficiency does not at least stay the same, what is the point to have much more data?! So something is unclear to me. Regards

takluyver commented 5 years ago

The data with a higher indexing rate might have been preprocessed (e.g. by Cheetah) to select only frames with a certain number of peaks, so the indexing rate may not be comparable (this is just guesswork - I don't know if that's the case).

This was also a fairly early run at EuXFEL. The experimental side of things is ahead of the data analysis side, but it's likely there were some limitations making it hard to get top quality data.

Calpas commented 5 years ago

Dear Takluyver, ok, thank you for your answer. Regards

European-XFEL / karabo_data

Virtual environment for several runs in one go #145