MineralsCloud / Express.jl

Express: a high-level, extensible workflow framework for accelerating ab initio calculations for the materials science community
https://mineralscloud.github.io/Express.jl/
GNU General Public License v3.0
25 stars 1 forks source link

Can't run Express #232

Closed paperref closed 2 years ago

paperref commented 2 years ago

Hi, I'm one of the referees for your paper. I've received the scripts you sent me through the editor and ran them (I'm using julia1.6 to run your code though). I'm then trying to run an eos calculation shown here: image

If I run it using xps I get this error: image

When trying to run it with a julia script I run into a different error: image image

Any ideas on how to fix this?

singularitti commented 2 years ago

Dear referee, hi! It is nice to talk to you in this way. Yes, I am well aware of this issue. It is related to how we save the tracked status of the Workflow. Is it possible that you were rerunning a workflow with an already-saved file in the same directory (It probably has the name status.jls or status.jld2, etc.)? The first screenshot is probably because you have a status.jls file but running with the latest version of code (where we shifted to the JLD2 file format recently.)? The second screenshot is caused by a similar reason, and I am aware of that. It is because we cannot save some Julia data structures to files and reload them. See issue 133 and issue 134. Unfortunately, I am not able to solve issue 134 immediately since it is a Julia language issue, and it is beyond my capability. I have tried several different solutions, but each of them has some drawbacks. But it might be simpler than I thought in the end (Some of my types seem to be storable). Anyway, I am trying my best to solve this issue. I will inform you immediately once it is fixed. To solve your problem temporarily, could you try deleting any .jls or .jld2 files and retry it?

paperref commented 2 years ago

Hi, Yea JLD2 is sometimes not the most reliable I have found. Maybe just using JSON with JSON3 or BSON would be an option? In any case, it seems that the code is somewhat running, but now I have a load of other errors that appear from within Unitful I think: image

singularitti commented 2 years ago

Dear referee, hi! Good news to hear it is finally running! Yes, JLD2 was not so smooth to use. I am not sure I can use a JSON format (with JSON.jl or JSON3.jl) to store the workflow status file since I am storing binary data here (functions, etc.). So it might not be able to be stored in human-readable formats. BSON is a possible format, but as I said, it is still not perfect. I will keep trying anyway.

For your screenshot, could you tell me what is 57.84 GPa and 8.0 here? I guess 57.84 GPa is the bulk modulus at 0 GPa and 8.0 is its derivative? Could you please show me your eos.yaml?

paperref commented 2 years ago

Hi,

Oh no I don't think you can ever store functions in a JLD2 since they are very tied to the generating environment, i.e. they heavily depend on the loaded modules, which other functions were already defined, world age etc. I've tried this before and it simply doesn't work afaik.

Back to the numbers, I honestly don't know where they come from, I'm just running the eos.yaml configuration from the documentation, I think these are generated somewhere inside your code.

paperref commented 2 years ago

Could you please show me your eos.yaml?

It is shown in the screenshot of the first post here.

singularitti commented 2 years ago

Dear referee, hi. I have never met such an error before. Could you show a little bit more of your screenshot (you could have it redacted first)? Is p=5.0 or p=-5.0 has this error? I do not have any 57.84 GPa or 8.0 in my code. So I guess it could be when the code is finding zeros of an equation of state, it came across some hidden bug (probably not in my code). I am updating the code to let it show more direct and precise error messages, stay tuned!

paperref commented 2 years ago

I’ve just tried with a different template file for the pw calculation and same result. I’m sure it’s not that there’s any number like that somewhere in your code, but it’s probably from parsing or something similar? I don’t see any output files being produced so I’m not even sure if it runs anything to begin with. I tried running the template itself and that all seems to work. Not clear what’s going on here

volkerblum commented 2 years ago

This sounds to me like a bug in one version of Unitful that is not present in another. Can each of you please post the versions of Unitful that you are using so that we can compare notes? That would be very helpful. Thank you!

paperref commented 2 years ago

My Manifest.toml looks like the attached: Manifest.txt

volkerblum commented 2 years ago

So Unitful version 1.11.0, right? That looks current.

https://github.com/PainterQubits/Unitful.jl/tags

@singularitti - what version are you using?

singularitti commented 2 years ago

So Unitful version 1.11.0, right? That looks current.

https://github.com/PainterQubits/Unitful.jl/tags

@singularitti - what version are you using?

Dear Dr. Blum @volkerblum and referee @paperref, Hi. I am using version 1.11.0, too. Here is my Manifest.toml. I have compared mine and the referee's, the only noticeable differences are the following packages:

 "FillArrays"
 "ArnoldiMethod"

They correspond to FillArrays.jl and ArnoldiMethod.jl, which are pretty standard Julia libraries. So I do not think there should be a big difference between our dependencies. However, I do notice that my Manifest.toml only has 165 entries, and the referee's Manifest.toml has 270 entries. There still exists a slim chance that their packages have more tricky dependencies that could cause this error.

I am pasting my Manifest.toml here, if the referees want to compare, they can use the following code:

using Pkg.TOML
mine = TOML.parsefile("/path/to/referees/Manifest.toml")  # All the packages that I installed
theirs = TOML.parsefile("/path/to/authors/Manifest.toml")  # All the packages that they installed
setdiff(mine, theirs)  # Construct the set of elements in `mine` but not in `theirs`
setdiff(theirs, mine)  # Construct the set of elements in `theirs` but not in `mine`

After all, I am considering this is not a problem of Unitful.jl. This is probably a problem with the "finding-zero algorithm" in Roots.jl. Since in my code, I was solving $P = P(V)$ backwards: $P(V=V_i) = P_i$ to find $V_i$ from a given equation of state. I guess it is when $P = -5$ GPa or $P = 5$ GPa have some problem with that. I have never seen such an error before in my tests.

singularitti commented 2 years ago

Dear Dr. Blum @volkerblum and referee @paperref, I think I knew where does the bug come from. It should be in this line. In the example, the equation of state's $B'_0 = 4.82$, when timed by $12$, it is $57.84$, and that is where $57.84 - 8.0$ come from. However, $4.82$ is unitless, I am not sure why it suddenly has a unit of GPa. Is it possible that the referee changes eos.yaml or something happens so that the file is wrongly parsed and the GPa was added to 4.82? Similar things could happen, please see one of my troubleshooting suggestions.

volkerblum commented 2 years ago

eos.yaml is shown as a screenshot in the first comment, I think. Should it look that way? Note that it says "4.82 GPa". Can you (@singularitti ) reproduce that bug if you have that line? Also, I believe @paperref mentioned that this is the eos.yaml file from the documentation. Is the "GPa" unit shown in the first eos.yaml screenshot in the documentation?

singularitti commented 2 years ago

Dear Dr. Blum @volkerblum and referee @paperref, I believe the error is from the "4.82 GPa" line. I am sorry I did not notice this line when the error was reported, so it took days for me to understand this error. The eos.yaml from the doc looks like this:

recipe: eos
cli:
  mpi:
    np: 16
template: template.in
save:
  status: status.jld2
fixed:
  pressures:
    unit: GPa
    values:
      - -5
      - -2
      - 0
      - 5
      - 10
      - 15
      - 17
      - 20
trial_eos:
  type: bm3
  values:
    - 300.44 bohr^3
    - 74.88 GPa
    - 4.82

There is no "GPa" after "4.82". I suppose the referee accidentally added it?

From the referee's configuration file I guess they were testing on NiO? While the parameters in our equation of state ( $V_0 = 300.44$ bohr³, $B_0 = 74.88$ GPa, $B'_0 = 4.82$) from our documentation are for Germanium. I am not sure these parameters are a good starting guess since NiO and Ge are of different structures. It still might be optimizable, but a bad guess would slow down the calculations and sometimes even fail to find the volume for the corresponding pressure. We suggest referencing the related publication results of NiO as the starting point of these parameters.

paperref commented 2 years ago

Oh I see, I will try without the GPa 🤞

paperref commented 2 years ago

Cool, it seems to be running now. Fantastic. So I’d say all is fine, but I think it would be highly useful if you implement a bit more checks and error handling so that it’s more clear what’s going on. And also something to add export statements such as export OMP_NUM_THREADS=1 etc (although I guess the idea is that you’d do that before running Express in the shell?). And the saved.jld2 issue. But aside from that all clear for me!

singularitti commented 2 years ago

Dear referee, Great to hear it works!

it would be highly useful if you implement a bit more checks and error handling so that it’s more clear what’s going on

Yes, I am working on adding more checks & error tracking things (e.g., instead of only saving a bare Exception type, now we save the entire stack trace also to know where exactly the error was thrown). More checks like type checking and value checking will be added, too. The entire codebase is iterating very fast.

And also something to add export statements such as export OMP_NUM_THREADS=1 etc (although I guess the idea is that you’d do that before running Express in the shell?)

Indeed, I do have the environment variable exported in my shell. I do not like dealing with environment variables in Julia because that might break the users' environment. But I can add a documentation item on the troubleshooting page.

the saved.jld2 issue

To solve this issue, I just worked out a new version. I found that instead of saving it as JLD2 files, I need to save it as .jls files, which can store anonymous functions with keywords. I will try to test it a few times later today, and hopefully, it will be solved completely.

paperref commented 2 years ago

I need to save it as .jls files, which can store anonymous functions with keywords.

Oh really, that’s interesting, I thought it was pretty much never possible to store functions. I used to generate their sourcecode using code_string from CodeTracking.jl.

Anyhow if it works it works and that’s great. I’ll submit a positive final decision for your paper. It was a long road but we got there in the end!