cesmix-mit / LAMMPS.jl

MIT License
33 stars 11 forks source link

Lammps error for division by 0 terminates Julia process #48

Open Joroks opened 3 months ago

Joroks commented 3 months ago

I've noticed that there are many scenarios where Lammps crashes (killing julia in the process) instead of throwing an error.

The easiest reproducable example I've found is:

lmp = LMP()
command(lmp. """
    variable a equal 1/1
    variable b equal 0/0
""")

LAMMPS.extract_variable(lmp, a) # 1.0
LAMMPS.extract_variable(lmp, b) # crash

The crash results in a generic error message being printed (not through the check function but directly to the output stream), which however doesn't stay around for long enough to actually read as julia terminates soon after wich wipes the error message.

From what I've gathered, this isn't a problem with the extract_variable method either, as the same thing has happened to me with gather as well.

#taken from `_get_count()`

#here I explicitly verify, that the requested compute exists, which is does although it might not have been evaluated yet.
API.lammps_has_id(lmp, "compute", name[3:end]) != 1 && error("Unknown per atom compute $name")
#lammps_extract_compute now triggers the compute to evaluate. Any error during the compute crashes lammps
count_ptr = API.lammps_extract_compute(lmp::LMP, name[3:end], API.LMP_STYLE_ATOM, API.LMP_SIZE_COLS)

My assumption is that if anything goes wrong during a calculation - for example division by 0 - Lammps just blows up. Crashing lammps is fine but also crashing julia - especially during a REPL session - should be avoided if possible.

Heres an exerpt from the lammps_extract_compute() Documentation:

If the compute’s data is not computed for the current step, the compute will be invoked. LAMMPS cannot easily check at that time, if it is valid to invoke a compute, so it may fail with an error. The caller has to check to avoid such an error.

I don't think it's possible to predict or even catch this error through the API so there is probably not much we can do about it. It might be best to just document this "behaviour" and maybe provide a list of known problems that would cause a crash.

Maybe we could also open an Issue for this on the lammps Github page? I'd assume that they already know about this though

vchuravy commented 3 months ago

Yeah this is something that would need to be discussed with upstream.

I think we are enable C++ exceptions, so at the API boundary the exception would need to to be caught and then turned into a proper LAMMPs error.

vchuravy commented 3 months ago

Might be worthwhile checking what happens with lammps python