MaterialsGalaxy / larch-tools

Galaxy tool wrappers for Larch analysis tools for X-ray spectroscopy
MIT License
2 stars 0 forks source link

Paper 5: pymatgen issue in Larch FEFF #37

Closed alex-belozerov closed 5 months ago

alex-belozerov commented 7 months ago

When specifing Mn as an absorbing atom in Larch FEFF for LaMnO3, I got an error.

The crystal structure file can be found here: https://www.ccdc.cam.ac.uk/structures/Search?Ccdcid=1667441&DatabaseToSearch=ICSD

Since, the Abraham's ipynb utilizing Feff 6L.02 under linux worked fine for the same crystal structure, the problem is presumably with pymatgen, the Python library we use to convert from CIF to FEFF .inp format.

A more detailed discussion of this can be found in Paper 5 reproduction ticket

Regarding my packages, I have pymatgen 2023.12.18 installed. The complete list of packages is shown below.

aiobotocore 2.7.0 aiohttp 3.9.0 aiohttp-retry 2.8.3 aioitertools 0.11.0 aiosignal 1.3.1 amqp 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.0.0 appdirs 1.4.4 apt-clone 0.2.1 apturl 0.5.2 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asteval 0.9.31 asttokens 2.4.0 async-lru 2.0.4 async-timeout 4.0.3 asyncssh 2.14.1 atpublic 4.0 attrs 23.1.0 awscli 1.22.34 Babel 2.13.0 backcall 0.2.0 bcrypt 4.1.2 beautifulsoup4 4.10.0 billiard 4.2.0 bleach 6.1.0 blinker 1.7.0 boto3 1.28.64 botocore 1.31.64 Brlapi 0.8.3 cachetools 5.3.2 celery 5.3.5 certifi 2020.6.20 cffi 1.16.0 chardet 4.0.0 charset-normalizer 3.3.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 colorama 0.4.4 comm 0.1.4 command-not-found 0.3 configobj 5.0.6 contourpy 1.2.0 cryptography 41.0.5 cupshelpers 1.0 cycler 0.12.1 dbus-python 1.2.18 debugpy 1.8.0 decorator 5.1.1 defer 1.0.6 defusedxml 0.7.1 dictdiffer 0.9.0 dill 0.3.7 diskcache 5.6.3 distro 1.7.0 dnspython 2.4.2 docutils 0.17.1 dpath 2.1.6 dulwich 0.21.6 dvc 3.30.1 dvc-data 2.22.0 dvc-gdrive 2.20.0 dvc-http 2.30.2 dvc-objects 1.2.0 dvc-render 0.6.0 dvc-s3 2.23.0 dvc-studio-client 0.15.0 dvc-task 0.3.0 emmet-core 0.75.0 entrypoints 0.4 exceptiongroup 1.1.3 executing 2.0.0 eyeD3 0.8.10 fabio 2023.10.0 fastapi 0.108.0 fastjsonschema 2.18.1 filelock 3.6.0 Flask 3.0.0 flatten-dict 0.4.2 flufl.lock 7.1.1 fonttools 4.47.0 fqdn 1.5.1 frozenlist 1.4.0 fsspec 2023.10.0 funcy 2.0 future 0.18.3 gitdb 4.0.11 GitPython 3.1.40 google-api-core 2.14.0 google-api-python-client 2.108.0 google-auth 2.23.4 google-auth-httplib2 0.1.1 googleapis-common-protos 1.61.0 gpg 1.16.0 grandalf 0.8 greenlet 3.0.3 grpcio 1.30.2 gto 1.5.0 h11 0.14.0 h5py 3.10.0 hdf5plugin 4.3.0 httplib2 0.20.2 hydra-core 1.3.2 idna 3.3 ifaddr 0.1.7 imageio 2.33.1 IMDbPY 2021.4.18 importlib-metadata 4.6.4 ipykernel 6.25.2 ipysheet 0.7.0 ipython 8.16.1 ipywidgets 8.1.1 isoduration 20.11.0 iterative-telemetry 0.0.8 itsdangerous 2.1.2 jedi 0.19.1 jeepney 0.7.1 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.2 json5 0.9.14 jsonpointer 2.4 jsonschema 4.19.1 jsonschema-specifications 2023.7.1 jupyter_client 8.4.0 jupyter_core 5.4.0 jupyter-events 0.8.0 jupyter-lsp 2.2.0 jupyter_server 2.8.0 jupyter_server_terminals 0.4.4 jupyterlab 4.0.7 jupyterlab-pygments 0.2.2 jupyterlab_server 2.25.0 jupyterlab-widgets 3.0.9 keyring 23.5.0 kiwisolver 1.4.5 kombu 5.3.4 larch 4.0 latexcodec 2.0.1 launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loader 0.3 lmfit 1.2.2 louis 3.20.0 lxml 4.9.4 macaroonbakery 1.3.1 maggma 0.60.0 Mako 1.1.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdurl 0.1.2 mistune 3.0.2 mongogrant 0.3.3 mongomock 4.1.2 monty 2023.11.3 more-itertools 8.10.0 mp-api 0.39.4 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 nbclient 0.8.0 nbconvert 7.9.2 nbformat 5.9.2 nemo-emblems 5.8.0 nest-asyncio 1.5.8 netaddr 0.8.0 netifaces 0.11.0 networkx 3.2.1 notebook 7.0.6 notebook_shim 0.2.3 numdifftools 0.9.41 numexpr 2.8.8 numpy 1.26.1 oauth2client 4.1.3 oauthlib 3.2.0 omegaconf 2.3.0 onboard 1.4.1 orjson 3.9.10 overrides 7.4.0 packaging 21.3 palettable 3.3.3 PAM 0.4.2 pandas 2.1.4 pandocfilters 1.5.0 paramiko 3.4.0 parso 0.8.3 pathspec 0.11.2 PeakUtils 1.3.4 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 22.0.2 platformdirs 3.11.0 plotly 5.18.0 ply 3.11 prometheus-client 0.17.1 prompt-toolkit 3.0.39 protobuf 4.25.1 psutil 5.9.0 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.4.8 pyasn1-modules 0.3.0 pybtex 0.24.0 pycairo 1.20.1 PyCifRW 4.4.6 pycparser 2.21 pycups 2.0.1 pycurl 7.44.1 pydantic 2.5.1 pydantic_core 2.14.3 pydantic-settings 2.1.0 pydash 7.0.6 pydot 1.4.2 PyDrive2 1.17.0 pyelftools 0.27 pyfai 2023.9.0 pygit2 1.13.2 Pygments 2.16.1 PyGObject 3.42.1 pygtrie 2.5.0 PyICU 2.8.1 pyinotify 0.9.6 PyJWT 2.3.0 pymacaroons 0.13.0 pymatgen 2023.12.18 pymongo 4.6.1 PyNaCl 1.5.0 pyOpenSSL 23.3.0 pyparsing 2.4.7 pyparted 3.11.7 PyQt5 5.15.6 PyQt5-sip 12.9.1 pyRFC3339 1.1 pyshortcuts 1.9.0 PySocks 1.7.1 python-apt 2.4.0+ubuntu2 python-dateutil 2.8.2 python-debian 0.1.43+ubuntu1.1 python-dotenv 1.0.0 python-gnupg 0.4.8 python-json-logger 2.0.7 python-magic 0.4.24 python-xlib 0.29 pytz 2022.1 pyxdg 0.27 PyYAML 5.4.1 pyzmq 25.1.1 referencing 0.30.2 reportlab 3.6.8 requests 2.31.0 requests-file 1.5.1 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.0 roman 3.3 rpds-py 0.10.6 rsa 4.8 ruamel.yaml 0.17.40 ruamel.yaml.clib 0.2.8 s3fs 2023.10.0 s3transfer 0.7.0 scikit-image 0.22.0 scikit-learn 1.3.2 scipy 1.11.3 scmrepo 1.4.1 SecretStorage 3.3.1 semver 3.0.2 Send2Trash 1.8.2 sentinels 1.0.0 setproctitle 1.2.2 setuptools 59.6.0 shortuuid 1.0.11 shtab 1.6.4 silx 1.1.2 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 soupsieve 2.3.1 spglib 2.2.0 SQLAlchemy 2.0.24 SQLAlchemy-Utils 0.41.1 sqltrie 0.8.0 sshtunnel 0.4.0 stack-data 0.6.3 starlette 0.32.0.post1 sympy 1.12 systemd-python 234 tabulate 0.9.0 tenacity 8.2.3 termcolor 2.4.0 terminado 0.17.1 threadpoolctl 3.2.0 tifffile 2023.12.9 tinycss2 1.1.1 tldextract 3.1.2 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.3 torbrowser-launcher 0.3.3 tornado 6.3.3 tqdm 4.66.1 traitlets 5.11.2 typer 0.9.0 types-python-dateutil 2.8.19.14 typing_extensions 4.8.0 tzdata 2023.3 ubuntu-drivers-common 0.0.0 ufw 0.36.1 uncertainties 3.1.7 Unidecode 1.3.3 uri-template 1.3.0 uritemplate 4.1.1 urllib3 1.26.5 uvicorn 0.25.0 vine 5.1.0 voluptuous 0.14.1 wadllib 1.3.6 wcwidth 0.2.8 webcolors 1.13 webencodings 0.5.1 websocket-client 1.6.4 Werkzeug 3.0.1 wheel 0.37.1 widgetsnbextension 4.0.9 wrapt 1.16.0 xdg 5 xkit 0.0.0 xlrd 1.2.0 xraydb 4.5.4 xraylarch 0.9.74 yarl 1.9.2 youtube-dl 2021.12.17 zc.lockfile 3.0.post1 zipp 1.0.0

tomlunderwood commented 7 months ago

@subindev-d and I have discovered a similar issue. Certain cif files just do not work with the Larch FEFF tool in Galaxy. For instance the cif file 1627088_TU.cif.txt, which is for Pd metal, results in the following error when used with 'Absorbing atom 0' and 'Radius 3.0' in the tool:

Traceback (most recent call last):
  File "/srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/muon-spectroscopy-computational-project/larch_feff/edf7f8ccf4af/larch_feff/larch_feff.py", line 112, in main
    paths_info = get_path_labels(Path(feff_dir, "paths.dat"))
  File "/srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/muon-spectroscopy-computational-project/larch_feff/edf7f8ccf4af/larch_feff/larch_feff.py", line 18, in get_path_labels
    with open(paths_file) as datfile:
FileNotFoundError: [Errno 2] No such file or directory: 'feff/paths.dat'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/muon-spectroscopy-computational-project/larch_feff/edf7f8ccf4af/larch_feff/larch_feff.py", line 165, in <module>
    main(structure_file, input_values["format"])
  File "/srv/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/muon-spectroscopy-computational-project/larch_feff/edf7f8ccf4af/larch_feff/larch_feff.py", line 114, in main
    raise FileNotFoundError(
FileNotFoundError: paths.dat does not exist, which implies FEFF failed to run

I obtained this cif file from the ICSD at https://www.ccdc.cam.ac.uk/structures/Search?Ccdcid=1627088&DatabaseToSearch=Published, but had to modify it so that there was no mention of the oxidation state so that it would run with Larch FEFF (something which is a known issue - the documentation in the tool says 'It is also possible that the cif file itself is not suitable, for example chemical symbols denoting ions (e.g. Fe2+) are not supported.'). Specifically I had to rename 'Pd0+' to 'Pd', and remove reference to the '_atom_type_oxidation_number' data. However, even with these modifications, the above error was thrown. Note that my modifications yield a valid cif file according to the tool https://www.cryst.ehu.eus/bincstrdb/validate/, so it's not that I've broken the cif file.

This is not the only cif file where I've encountered this error. The cif file AMS_DATA.cif.txt for copper, obtained from https://rruff.geo.arizona.edu/AMS/minerals/Copper, yields the same error. This time though I made no modifications to the cif file to remove oxidation-state information: the error occurred with the cif file obtained straight from the American Mineralogist Crystal Structure Database.

Looking for the cause, I found the following Ifeffit threads from 2021 and 2023: https://millenia.cars.aps.anl.gov/pipermail/ifeffit/2021-August/010265.html, https://www.mail-archive.com/ifeffit%40millenia.cars.aps.anl.gov/msg07452.html. These suggest that some cif files which work fine in Artemis do not work in larch (or pymatgen), but I've yet to look further into this.

tomlunderwood commented 7 months ago

I have identified the cause of this issue and come up with a potential solution. The problem is with the FEFF input files created by larch_feff.py for the cif files mentioned above which throw an error. Specifically, the 'POTENTIALS' section in the FEFF input file is incorrect. For the cif file https://github.com/MaterialsGalaxy/larch-tools/files/14168667/1627088_TU.cif.txt larch_feff.py generates a 'POTENTIALS' section with one potential with index '0'. However the 'ATOMS' section refers to two potentials: potentials with index '0' and '1'. Hence FEFF cannot run because potential '1' is not defined in the FEFF input file.

The cause

Broadly speaking the problem might be something to do with the fact that for a pure metal, e.g. Pd, there is only one chemical species present. I suspect that FEFF wants for XAS calculations a different potential for the absorbing atom (i.e. a Pd atom) and scattering atoms (i.e. the other Pd atoms in the system), and that larch_feff.py does not deliver this for the case of one species in the system.

With regards to source code, I have pinpointed where the source of the issue is. Interestingly, Abraham's notebooks generate FEFF input files which do not have this problem: FEFF runs fine using Abraham's scripts. See https://github.com/UK-Catalysis-Hub/XAS-Workflow-Demo/blob/main/psdi_phase_1/larch/Paper%2008%20Reproduce%20XAS.ipynb, noting that the code in https://github.com/UK-Catalysis-Hub/XAS-Workflow-Demo/tree/main/psdi_phase_1/larch/lib does the heavy lifting with regards to generating the FEFF input file from the cif file (and, e.g. the absorbing atom type and maximum radius to consider). Upon closer inspection of Abraham's code, and comparing it to larch_feff.py, I found that the way FEFF input files are generated by both are not the same. larch_feff.py uses pymatgen.io.cif.CifParser to read the crystal structure, followed by pymatgen.io.feff.Atoms, pymatgen.io.feff.Header and pymatgen.io.feff.Potential to construct the FEFF input file 'from scratch' using parameters such as the maximum radius and absorbing atom passed to larch_feff.py. By contrast Abraham's code uses larch.xrd.cif2feff to create a FEFF input file from a cif file 'in one go'. Given that Abraham's approach does not throw any errors, I'm inclined to assume that this is the 'correct' approach, and that the bug is in the approach taken in larch_feff.py.

I did try and pinpoint exactly what was wrong with the approach taken in larch_feff.py. At first glance I could find nothing wrong with the source code the. I think the bug is deeper, i.e. it is in pymatgen.io.feff.Potential. I've attached an archive demonstrating this: potentials_section_bug_demo.zip The code minimal_example.py generates a 'POTENTIALS' section for a cif file which is the command-line argument to the script. Running the script with the cif file Pd.cif yields a 'POTENTIALS' section (which is output to a file 'POTENTIALS') which has only potential '0', but not '1'. However running the script with the cif file PdO.cif does not yield this issue. I do not know why this is, even after looking at the source code for pymatgen.io.feff.Potential: I think the problem is in the __str__ function. There may not be a problem; perhaps one potential in 'POTENTIALS' in FEFF is sensible for FEFF calculations for Pd; XAS is not the only application of FEFF I believe.

The solution

An obvious solution, given what I said above, is to use Abraham's approach to generate the FEFF input file, i.e. use larch.xrd.cif2feff. I have made a branch of larch-tools which does this, and it seems to solve the problem. I just need to have @patrick-austin about reviewing the branch, and get his general thoughts about this. There may be a good reason that Patrick did not employ Abraham's approach!

patrick-austin commented 7 months ago

a 'POTENTIALS' section with one potential with index '0'. However the 'ATOMS' section refers to two potentials: potentials with index '0' and '1'. Hence FEFF cannot run because potential '1' is not defined in the FEFF input file.

Broadly speaking the problem might be something to do with the fact that for a pure metal, e.g. Pd, there is only one chemical species present. I suspect that FEFF wants for XAS calculations a different potential for the absorbing atom (i.e. a Pd atom) and scattering atoms (i.e. the other Pd atoms in the system), and that larch_feff.py does not deliver this for the case of one species in the system.

This fits with what I've noticed this in the past, from memory there's a kind of broken symmetry where ATOM 0 (the one that absorbs the x-ray) is treated different from other atoms of that species.

There may be a good reason that Patrick did not employ Abraham's approach!

I did employ Abraham's approach, but his approach before this change: https://github.com/UK-Catalysis-Hub/XAS-Workflow-Demo/commit/8cc283b9e04f9d4b8ead82a443b5eb4e1a567022#diff-728e1e408016693f62158c7a8d125dea8a747d8d99fe491488f7fa686943af69 Before, his notebooks used pymatgen, and I was following what he did in those when writing the Python for the tools (suffice to say I had no experience with Larch or pymatgen before that point). I had no idea that he had changed library used in his notebooks - but an understandable choice given the apparent bugs with pymatgen...