Open Calpas opened 5 years ago
You haven't missed anything - it processes one run at a time. Feel free to write a script that does it for multiple runs.
I did that. When I run over sample ID-80 (from the cxidb), for several runs I have this error message (1) while it runs fine for ID-83. Regards
(1)
/beegfs/desy/group/it/ReferenceData/cxidb/ID-80
INFO:hdf5_virtualise.euxfel_vds:Reading run /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 ...
INFO:hdf5_virtualise.euxfel_vds:Making virtual dataset for 1192 trains (1486155619:1486156811)
INFO:hdf5_virtualise.euxfel_vds:60 frames per train
INFO:hdf5_virtualise.euxfel_vds:Pixels in one detector module: (512, 128)
INFO:hdf5_virtualise.euxfel_vds:VDS shape: (71520, 16, 512, 128)
Traceback (most recent call last):
File "/gpfs/exfel/sw/software/hdf5-virtualise/make_cxi.py", line 11, in
Try this:
karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80
If that reports errors, then there's a problem with the way the data has been written.
I got this output:
-bash-4.2$ karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80 Checking run directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80 Validation failed!
No usable files found directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80
Sorry, my mistake. You need to run it on the run directory you're using, like this:
karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066
I got:
-bash-4.2$ karabo-data-validate /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 Checking run directory: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066 Validation failed!
Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/trainId file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5
Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/mask file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5
Index referring to data (15000) outside dataset (7500) dataset: INSTRUMENT/SPB_DET_AGIPD1M-1/DET/7CH0:xtdf/image/pulseId file: /beegfs/desy/group/it/ReferenceData/cxidb/ID-80/proc/r0066/CORR-R0066-AGIPD07-S00001.h5
...
Yup, that's the problem that's preventing it from generating a virtual dataset. We don't want to try to work around this kind of file corruption, because it destroys the link between the data and which train it belongs to.
So only one sample ID is usable in the cxidb?!
When I'm running crystfel to find peaks, I have far less then 5% efficiency for the image to be "indexable" (for lcls, I had about 25%). Is that extremely low efficient expected? Is there a littetatur on this to know what is expected? Regards
For analysing from the processed data, that seems to be the case, unfortunately. The CFEL team that did the original analysis worked from the raw data, which probably doesn't have this issue. This is an issue that was introduced by an old version of the calibration process, before it got fixed.
The detector group at XFEL can reprocess the raw data to generate new processed files, but they don't go in the normal proposal folder, and as far as I know we can't update the data on CXIDB.
You may want to check the CrystFEL results with Tom White - I'm not an expert on crystallography. But I think low rates are expected for this data.
I guess (hope) that maybe this low eff if wrong. Indeed if we have much much more data but in the end the efficiency does not at least stay the same, what is the point to have much more data?! So something is unclear to me. Regards
The data with a higher indexing rate might have been preprocessed (e.g. by Cheetah) to select only frames with a certain number of peaks, so the indexing rate may not be comparable (this is just guesswork - I don't know if that's the case).
This was also a fairly early run at EuXFEL. The experimental side of things is ahead of the data analysis side, but it's likely there were some limitations making it hard to get top quality data.
Dear Takluyver, ok, thank you for your answer. Regards
Dear experts, I'm using the make_cxi.py script to make a virtual environment for one run. I did not see an option if I want to do it for all the runs at the same time, a command like that (*). I could make a shell script but this option would be useful. Please let me know if I missed it. Regard
() make_cxi.py /path/ID-83/ -o id83_allrun.cxi