Chemellia / AtomicGraphNets.jl

Atomic graph models for molecules and crystals in Julia
MIT License
61 stars 10 forks source link

Ideas for more test cases #12

Open rkurchin opened 4 years ago

rkurchin commented 4 years ago

Saving these here because a DM Slack thread with @venkvis is not the ideal archival solution, also in case others want to comment/discuss...

jonathanmfung commented 2 years ago

Re: Ionic Transport

I haven't been able to work with this paper's data, here's a summary of what I've run into:

cif data is formatted as non-ASE compliant JSON, and Xtals.jl processing on a manual conversion is unable to convert to P1 space group.

Full Details & Reproduction - Goal: Extract cif files and corresponding Ionic Transport features (e.g. Largest free sphere or BVSE Activation Energy) - Basing off example1, which I have been able to run - Paper is based on the SPSE dataset - SPSE Dataset: https://figshare.com/articles/dataset/High-Throughput_Screening_Platform_for_Solid_Electrolytes_Combining_Hierarchical_Ion-Transport_Prediction_Algorithms_/12011412?file=22067844 - Paper also provides Supplementary excel file with Ionic transport data of materials with IDs coming from icsd, "custom", "ionTransport", Camille, and "cst" - Ionic Transport Excel (in Supporting Info): https://onlinelibrary.wiley.com/doi/abs/10.1002/adfm.202003087 - SPSE also provides their full data set, with cifs encoded in json (cifs.json) - I've tried to convert the cifs.json to proper cifs: - cifs.json is a weird list of json objects (not formatted in an array), so take line 200 as n example - `sed -n 200p cifs.json > Cs2LiGaF6.json` - Corresponds to Line 62 in Excel ("Li" Sheet) - Unable to use `ase convert Cs2LiGaF6.json Cs2LiGaF6.cif`: "Does not resemble ASE JSON database" - Manually convert to cif format (remove all non-crystal data, basically format as a hashmap) - [LiNdP4O12_manual.txt](https://github.com/Chemellia/AtomicGraphNets.jl/files/8510439/LiNdP4O12_manual.txt) - Converting with Xtals.Crystal() - Not in P1 symmetry, so tries to convert to it - errors with "Overlapping atoms..." - If I force Xtals.jl to not convert it to P1, AtomGraph() errors that the "Adjacency / distance matrices must be symmetric" (this seems expected since crystal is not symmetric) - Alternatively, get cif data from external sources - Most IDs are from ICSD, which is a closed dataset (atleast the API) - Only free way is through scraping http://aflowlib.org/material/?id=9004 - Note: cif downloaded from here is already in P1 - but the dataset has ~29k items... - SPSE also provides a django webapp, but I am unable to run it (incorrect package versioning)

I am not comfortable with space groups, so I have no idea where conversion is going wrong. Also, I do not believe ase specifies their exact JSON format.

I've looked at the HTGW paper, and I am more confident in getting something working, I just need to wrap my head around the wavefunctions and formulas. I may just work off its parent database (C2DB).

rkurchin commented 2 years ago

Wow, thanks for taking a crack at this! I visualized your manually converted CIF in VESTA, and I can't see any major red flags so I think the structure conversion is working fine. I'd have to spend a bit more time than I have right now to troubleshoot the conversion issues, but two questions:

  1. Does ASE read the structure correctly after you convert it? (i.e. could you get it to work in Xtals by "going through" ASE?)
  2. How automatable is the manual JSON conversion you did? I suspect the overlapping atoms issue is to do with the way symmetry is represented in non-P1 .cif files, which Xtals basically bypasses by storing everything as P1. Basically, if you know some symmetry relations, you don't have to explicitly store the coordinates of every atom since you can reconstruct them from the symmetry. But Xtals doesn't use that, so it has to explicitly store every coordinate, thus converting to P1. When you don't convert, you're probably getting an asymmetric matrix (this is different from an asymmetric crystal, it just means that the distance matrix is not symmetric across its diagonal, so it thinks the distance from atom A to B is different than from B to A, which makes no sense) because of some of those "missing" atoms that the .cif format knows about from symmetry, but Xtals doesn't. Hopefully this makes some sense. Anyway, I suspect if you just wrote to like .xyz or something as an intermediary, it might solve the issue. Hacky, I know, but 🤷‍♀️
jonathanmfung commented 2 years ago
  1. Oh, good idea on the conversion. ase is able to read it, and write back as P1 LiNdP4O12_ase.txt:
    manual = ase.io.read("LiNdP4O12_manual.cif") # I 1 2/c 1
    type(manual) == ase.atoms.Atoms
    ase.io.write("LiNdP4O12_ase.cif", manual) # P 1

This is a non-issue: ase uses the "updated" cif spec that uses the 'space_group' category instead of 'symmetry', _space_group_symop_operation_xyz. So for Xtals to read properly, need to sub these fields in, ex: sed -i "s/_space_group_symop_operation_xyz/_symmetry_equiv_pos_as_xyz/" LiNdP4O12_ase.cif.

However, ase produces a 4x unit cell version of this compound for some reason, Nd4 P16 O48 Li4 (due to P1 conversion?). I will need to test with more compounds.

Both Xtals and AtomGraphs are able to read this in: LiNdP4O12 Concerning the unit cell duplication, would GNNs care about excess nodes/edges? I assume not really, as its kind of a crude analog of periodicity.

Julia Output ```julia julia> c = Crystal("LiNdP4O12_ase.cif") ┌ Info: Crystal LiNdP4O12_ase.cif has space group. I am converting it to P1 symmetry. └ To prevent this, pass `convert_to_p1=false` to the `Crystal` constructor. Name: LiNdP4O12_ase.cif Bravais unit cell of a crystal. Unit cell angles α = 90.000000 deg. β = 90.100000 deg. γ = 90.000000 deg. Unit cell dimensions a = 9.844000 Å. b = 7.008000 Å, c = 13.250000 Å Volume of unit cell: 914.073072 ų # atoms = 72 # charges = 0 chemical formula: Dict(:P => 4, :Li => 1, :Nd => 1, :O => 12) space Group: P1 symmetry Operations: 'x, y, z' bonding graph: # vertices = 72 # edges = 0 julia> n = AtomGraph(c) ┌ Warning: Your cutoff radius is quite large relative to the size of your unit cell. This may cause issues with neighbor list generation, and will definitely cause a very dense graph. To avoid issues, I'm setting it to be approximately equal to the smallest unit cell dimension. └ @ AtomGraphs ~/.julia/packages/AtomGraphs/TnX1D/src/graph_building.jl:164 AtomGraph{Crystal} with 72 nodes, 480 edges atoms: ["Nd", "Nd", "Nd", "Nd", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "Li", "Li", "Li", "Li"] ```
  1. The JSON conversion is essentially stripping and reformatting, and I actually started writing a function before I thought to try it manually. So, if everything is good here I can try working on making an example case.

I was also wondering why space groups weren't being utilized here, as I feel they could be nice when dealing with crystal prototypes or anisotropic properties. But I guess dealing with P1 only is easier, and space group info should be embedded in the graph already.

Also, I tried to change the manual cif to xyz, but Xtals can only read in cif and cssr. And oops, I forgot that crystals have to be symmetric...

rkurchin commented 2 years ago

Concerning the unit cell duplication, would GNNs care about excess nodes/edges? I assume not really, as its kind of a crude analog of periodicity.

This multiplication of the unit cell is not really ideal, although in principle you're right that it shouldn't matter. At the moment, in practice, it does, because the pooling operations aren't smart enough, so you'd still get different results from a primitive cell graph vs. one of these supercelll-type graphs. All the more motivation for structurally aware pooling, cf #10...

I was also wondering why space groups weren't being utilized here, as I feel they could be nice when dealing with crystal prototypes or anisotropic properties.

They would definitely be nice, and I would love it if there were a Crystal-like object that supported that rather than having to explicitly store all the coordinates! Unfortunately, to my knowledge, there isn't at present...would be a fun project to build one...🤔

But I guess dealing with P1 only is easier, and space group info should be embedded in the graph already.

The first part is true, but I disagree with the second part...you do strictly lose information going from 3D coordinates to an adjacency matrix representation, e.g. you don't know anything about angles, which are crucial to symmetries...

Also, I tried to change the manual cif to xyz, but Xtals can only read in cif and cssr.

Xtals can read xyz, you just need a different function because it reads to an Atoms object rather than a Crystal (then you can build a Crystal object from it).