WarwickMicroscopy / Felix

Felix: Bloch wave method diffraction pattern simulation software
16 stars 10 forks source link

Test in GaAs_short fails #186

Closed RudoRoemer closed 1 year ago

RudoRoemer commented 5 years ago

Hi Richard,

the test in GaAs_short, with DM3 images present, still fails (and takes a long time). Can you please have a look? Perhaps this is simply because the chosen RefinementModeFLAG is wrong in felix.inp? The error is

 Iteration    68
   Current Unit Cell Parameters   5.7683  5.7683  5.7683
 Bloch wave calculation...
 Figure of merit   20.6158%
 Specimen thickness  930 Angstroms
 Thickness range    0 Angstroms
 Current best figure of merit   20.62%
 Refining, point 3 of 3
 Iteration    69
   Current Unit Cell Parameters   5.7683  5.7683  5.7683
 Bloch wave calculation...
 Figure of merit   20.6162%
 Specimen thickness  930 Angstroms
 Thickness range    0 Angstroms
 Current best figure of merit   20.62%
 Concave set, predict minimum at     NaN with fit index     NaN

0 = rank, error in Absorption(CALL AbsorptiveScatteringFactor) 0 = rank, error in SimulateAndFit(Absorption) 0 = rank, error in MaxGradientRefinement(SimulateAndFit) 0 = rank, error in felixrefine(MaxGradientRefinement) 0 = rank, error in felixrefine(ABORTING) 1 = rank, error in Absorption(CALL AbsorptiveScatteringFactor) 1 = rank, error in SimulateAndFit(Absorption) 1 = rank, error in MaxGradientRefinement(SimulateAndFit) 1 = rank, error in felixrefine(MaxGradientRefinement) 1 = rank, error in felixrefine(ABORTING)

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

[aquarius.theory.warwick.ac.uk:11702] 1 more process has sent help message help-mpi-api.txt / mpi-abort [aquarius.theory.warwick.ac.uk:11702] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

RudoRoemer commented 5 years ago

The call was

aquarius GaAs_short>mpirun -N 2 ../../src/felix.INT64NGNU.d

rbeanland commented 5 years ago

This is probably the bug in the numerical evaluation of the absorption integral. This falls over very occasionally although I have no idea why.

dnjohnstone commented 5 years ago

I'm also having an issue with GaAs short on a fresh install of the master version of felix using gfortran-8.3.0 compilers on linux (debian). I think that the error is only in saving the outputs? Any ideas?

 -----------------------------------------------------------------
 felixrefine: 'Version: master / 1.03 / r.beanland               '
              'Date: 14-Mar-2019                                 '
              'Refinements B,C,D,F,H and S working, no HOLZ      '
 -----------------------------------------------------------------
 total number of MPI ranks  001, screen messages via rank 000
 -----------------------------------------------------------------
 Reflection pool of  200 with a minimum of   50 strong beams
 Refining Isotropic Debye Waller Factors, D
 Refining by maximum gradient
 Material: GaAs, F -4 3 m
 77 experimental images successfully loaded
 Mean inner potential  15.32 Volts
 203 unique structure factors
 Starting absorption calculation...
 Absorption    0 hrs   0 mins   8 sec
 Number of independent variables =     2
 Bloch wave calculation...
 Simulation    0 hrs   0 mins  40 sec
 Figure of merit   20.3578%
 Specimen thickness  930 Angstroms
 Thickness range    0 Angstroms
 Writing output; baseline simulation
At line 370 of file write_output_mod.f90 (unit = 49, file = 'GaAs_I0000_093nm_080x080/GaAs_-4-4+10.bin')
Fortran runtime error: Write exceeds length of DIRECT access record

Error termination. Backtrace:
#0  0x7f39d87008da
#1  0x7f39d8701395
#2  0x7f39d8701b1a
#3  0x7f39d88f90dd
#4  0x7f39d88f9140
#5  0x7f39d88f9a98
#6  0x5654397dd18e
#7  0x5654397a20a1
#8  0x56543978a69e
#9  0x7f39d8337b16
#10  0x56543978a6c9
#11  0xffffffffffffffff
dnjohnstone commented 5 years ago

If it helps to narrow it down - GaAs_long is saving me off lots of .bin files, which I think are ok.

Loading those .bin files is a bit problematic for me though, I'm trying to use ImageJ with 64-bit raw import and little-endian byte order, which I think should be correct... It's not clear to me how many pixels there should be though, IPixelCount is 160 but setting it as that doesn't seem correct.

EDIT - ok, loading with 70x70 pixels as indicated by the filename got that correct and it does seem that the GaAs long simulation has gone ok.

RudoRoemer commented 1 year ago

I reworked some of the samples directory and also made two scripts which (i) produe the samples images and (ii) compare them to any new version. The error noted above is still occuring, but only in the GNU-compiled version of the code. I will work on this a bit these days, time permitting. Seems a relatively minor bug due to use of (:) in line 370 as stated above (could be 376 now), so probably just a declaration mistake.

RudoRoemer commented 1 year ago

This is likely to be an error fixable by compatible choice of IByteSize in the input file due to the specific compiler/architecture. We have found that IByteSize=2 works on 64-bit machines with Intel ifort, while IByteSize=8 works with the GNU compilers on also 64-bit machines.

The sample runs still crash at the moment for ifort with an error

At line 504 of file write_output_mod.f90 Fortran runtime error: Index '1' of dimension 1 of array 'iindependentvariabletype' above upper bound of 0

This seems genuine coding bug, Richard is on the case (some changes in the code only work for "refinement" mode while the samples just run a "simulation").

I shall close this bug now.