Global-Chem / global-chem

A Knowledge Graph of Common Chemical Names to their Molecular Definition
https://globalchemistry.org/
Mozilla Public License 2.0
156 stars 21 forks source link

SS-183: Find a Crystal structure of Diamond and Pass through Psi4 Quantum #314

Open Sulstice opened 5 months ago

Sulstice commented 5 months ago

We are going to test first cases of writing universal coordinates of gemstones

1.) Find a crystal PDB file of diamond with one unit or repeating units. 2.) Process the coordinates through a quantum software: https://github.com/psi4/psi4

Code:

import psi4

psi4.set_memory('1000mb')
psi4.core.set_num_threads(1)

zmatrix = '''\
Coordinates Here
'''

universe = psi4.geometry(zmatrix)
universe.update_geometry()
universe.print_out()
ANUGAMAGE commented 4 months ago

Hi Sul. I am on to this. I got the PDB file of diamond. I will run it through this code. Here are the coordinates

Diamond Coordinates :

HEADER
REMARK Spartan exported Fri Jan 30 16:05:42 1998
HETATM    1  C   UNK  0001       1.339   5.378  -4.149
HETATM    2  C   UNK  0001      -1.079   1.746  -3.946
HETATM    3  C   UNK  0001       1.441   1.816  -3.988
.
.
END
Sulstice commented 4 months ago

Start thinking like a scientist. You have coordinates of diamond. Diamond comes as a solid that is packed of repeating units. Does psi4 take repeating units?

Put an image of your coordinates here by dragging and dropping. Lets visualize it first.

ANUGAMAGE commented 4 months ago

I dont think that repeating units are working with psi4. Here is the image

image

Sulstice commented 4 months ago

So you might have to choose a quantum software that can: https://pyscf.org/

1.) Develop some code with pyscf that takes the input of diamond. 2.) Explain to me what is tdscf excitations? What is the expected output according to the documentation?

ANUGAMAGE commented 4 months ago

1.) I am developing the PySCF code in here : LInk : https://colab.research.google.com/drive/1xJssHrv26Fs1Lo0kgPznO4aac3H1aTXB?usp=sharing

The mol.build() method ultimately converts the given geometry into an internal format stored as Mole._atom, which can be accessed and manipulated for further computational chemistry tasks.

2.) Time-Dependent Self-Consistent Field (TD-SCF) excitations often referred to as Time-Dependent Density Functional Theory (TD-DFT) focuses on calculating the energies required to excite a molecule from its ground state to various excited states, based on its optimized geometry.

Currently going through the documentation to get the final output.

Sulstice commented 4 months ago

Next tasks:

1.) See if you can get the monomer or one cube from the lattice structure in xyz format. Once done, if you can achieve it. Pass it through https://github.com/wutobias/r2z to convert the cartersian into internal coordinates. See if that works if not, lets do it manually.

2.) Figure out which basis set and level of theory to use for gemstones.

ANUGAMAGE commented 4 months ago

1.) Adamantane is the monomer in Diamond. Adamantane lattice structure in xyz format :

26

C         -0.93903        1.24489        0.84444
C         -1.49588        0.30745       -0.24168
C         -0.70271        0.51632       -1.54343
C          0.78833        0.21819       -1.30853
C          1.33013        1.15003       -0.21056
C          0.54728        0.93699        1.09725
C          0.69711       -0.52806        1.54376
C          0.15348       -1.46986        0.45477
C         -1.33387       -1.15245        0.21717
H         -1.51061        1.12550        1.77324
H         -1.05530        2.28963        0.53054
H         -2.55532        0.52706       -0.41317
H         -0.82794        1.54722       -1.89778
H         -1.09577       -0.14046       -2.32969
H          1.34886        0.37417       -2.23627
H          1.24522        2.19560       -0.53055
H          2.39649        0.95361       -0.04383
H          0.93672        1.60215        1.87607
H          0.15551       -0.69139        2.48361
H          1.75214       -0.75477        1.74017
H          0.26285       -2.51156        0.77670
H         -1.74585       -1.83160       -0.53942
H         -1.90385       -1.31988        1.13953
C          0.93977       -1.24257       -0.84799
H          0.57515       -1.92122       -1.62949
H          1.99984       -1.47749       -0.69154

Conversion of Adamantane from cartersian into internal coordinates in Colab : Link : https://colab.research.google.com/drive/1lvII7NR0HY9TmeiaCXMvpV2eedSGfFCx?authuser=1

Sulstice commented 4 months ago

That's not right. How is Adamantane the monomer of diamond? It's a functional group. There's no hydrogens. Think for a second on how you are going to do this. Come up with a plan and don't make wild guesses.

ANUGAMAGE commented 4 months ago

Sorry @Sulstice now I understood the mistake. I think I need to draw a monomer of diamond from scratch. I will think more

Sulstice commented 4 months ago

Yep, it's not always that easy.

ANUGAMAGE commented 4 months ago

New structure - VESTA 7_17_2024 12_07_19 AM Hi Sul, I generated this structure (FCC unit cell) using few documents(Because Diamond Unit Cell was based on it) . I`ve used VESTA for the generation. But when I was adding bonds to it structure is converting in to a mess. However the thing that we need is the x y z coordinates for the carbon atoms right? So I can get a 'xyz' file. But the problem is a monomer of Dimond need to consist just with 8 atoms. (when space filling)

figure1ab Diamond vesta - VESTA 7_18_2024 12_25_21 AM Diamond vesta - VESTA 7_18_2024 12_25_11 AM Got a coordinate file. But there are things that need to clarify.

Sulstice commented 4 months ago

@rylandf If you're still alive, might need some help.

rylandf commented 4 months ago

Yeah I'm alive. I guess there's a few things to talk about but it looks like you're on the right track.

  1. When dealing with crystalline materials such as minerals, the smallest unit is called a unit cell. A monomer is used for the smallest unit of a polymer. The difference is in how the are connected; a unit cell repeats in every direction, whereas a monomer typically repeats through a couple specific linkages.
  2. Because of this, the atoms in a unit cell which are only partially inside the unit cell are only partially counted, since they are shared with multiple unit cells. Corners count as 1/8, edges as 1/4, and sides as 1/2. This should be apparent once you understand how the unit cell repeats in space.
  3. I believe in VESTA there is a setting to define minimum and maximum bond lengths. You can look up the bond lengths for a material (should be just one for diamond) and set them appropriately with a tolerance so that only the real bonds are shown. This shouldn't matter if you only need an xyz file, but for other structure files that include bonds it will. It also makes nicer pictures.
  4. This is really just for educational purposes, but you can calculate the bond lengths given the atomic radii and some geometry. This was useful during my crystallography class for getting a better spatial intuition for how atoms are actually packed in 3D space.
ANUGAMAGE commented 4 months ago

@rylandf thank you very much for your help and commitment on this. I really appreciate your help.

@Sulstice As Raylad said below structure(unit cell not the monomer) will be partially counted when we do space filling. A conventional unit cell of diamond has eight (8) atoms: Corners: 8 × ⅛ = 1 Faces: 6 × ½ = 3 Interior: 4

58437-21-21 5e-question-digital_image001 Screenshot from 2024-07-22 09-13-20

Furthermore, as we are planing to excite the unit cell of diamond the file that we only consider just for now is xyz coordinates file. Hence, according to the details that provided by @rylandf we can prepare a xyz file of the unit cell of diamond without considering the real bonds between atoms. And I think if we change the bond lengths of this structure it may effect the xyz coordinates of each atom as well? So if it is what is the minimum , maximum or how much of a bond length that we need to select?(The carbon–carbon (C–C) bond length in diamond usually is 1.54 Å so I think this value may be appropriate). Am I right Sul?

Sulstice commented 4 months ago

@ANUGAMAGE Sure

Looking at the pictures of diamond @rylandf with the repeating units. The input file is then what into PySCF:


from pyscf.pbc import gto as pbcgto
cell_diamond = pbcgto.M(atom = '''
     C     0.0000      0.0000      0.0000
     C    0 .8917      0.8917       0.8917
     C     1.7834       1.7834      0.0000
     C      2.6751      2.6751       0.8917
     C     1.7834      0.0000      1.7834
     C     2.6751      0.8917       2.6751
     C     0.0000     1.7834        1.7834
     C    0.8917      2.6751         2.6751
''',
basis = 'gth-szv',
pseudo = 'gth-pade',
 a = np.eye(3) * 3.5668)

There we go, that's the monomer of diamond for the input.

@rylandf What level of theory and basis set do people use for something like diamond. I guess ab-initio with coupled cluster for the electron correlation is important for these systems?

rylandf commented 4 months ago

Whatever software you're using should have a setting for crystals, it might be called something like "periodic boundary conditions", "infinite lattice", etc. If you have the setting correct then the software will expect a structure file which repeats, and handle everything appropriately.

Keep in mind I've only done DFT with pseudopotentials and planewaves basis. Either way, it depends dramatically on what you're trying to accomplish, what information you want from these calculations. I have a couple sources to share to give you some direction, but you're going to have to do some comparing between them to figure out the best for your needs. General best practice is to repeat the calculations with a couple basis sets and compare the results with each other and experimental data if available to test for convergence and see what best captures all the relevant forces.

Here's a couple databases of force fields for different elemental systems: https://www.ctcms.nist.gov/potentials/ https://openkim.org/ https://www.sciencedirect.com/science/article/abs/pii/S000888461730409X

And here's a couple papers that may be relevant, the first one specificially used diamond as an example while analyzing basis set selection: https://pubs.acs.org/doi/10.1021/acs.jctc.9b01004 https://arxiv.org/abs/2112.05824 https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7200d8a0e7df7e39879cc92893ae6e5a26cdd5c5 https://pubs.acs.org/doi/10.1021/acs.jctc.2c00380 https://www.nature.com/articles/s41524-024-01249-y https://arxiv.org/abs/2308.08098

Sulstice commented 4 months ago

@rylandf Thanks Ryland. PBC is our main issue. Psi4 doesn't process repeating units so we opted into for PySCF. How it handles PBC I don't exactly know.

The first paper that you suggested is pretty good and I see some post-HF ab-initio methods (MP2) in the second paper seem pretty good.

@ANUGAMAGE

1.) Suggest to me a level of theory and basis set that you think is appropriate for diamond based off the papers. 2.) Explain to me, what is the difference between Density Functional Theory (DFT) and ab-initio 3.) Explain to me, in general terms, the difference between double-, triple-, and quadruple-zeta correlation-consistent Gaussian basis sets

ANUGAMAGE commented 3 months ago

going through few papers mentioned in above, 1) Diamond is a crystal structure so it is comparatively a large system. Therefore applying Ab-initio methods based on Hartree-Fock (HF) and post-Hartree-Fock methods will be complexed and computationally intensive(suitable for small systems). Besides, there are optimized, less complexed and less computationally intensive methods like DFT are available, we can use that methods for large crystal structures such as diamond.

Hence I suggest below methods : crystal-cc-pVDZ (correlation-consistent polarized valence double zeta) crystal-cc-pVTZ (correlation-consistent polarized valence triple zeta) : For more accurate property calculations, higher flexibility and ability to better capture electron correlation effects.

2) DFT mainly focused on, which specific are that an electron can be located, Furthermore, it is considering electron density rather than considering the coordinates of a each electron and how each electron make interactions with each other. Therefore, it was used to calculate properties like total energy in large systems like crystal structures. These methods are less complicated when comparing to ab-initio and less computationally intensive.

On the other hand, ab-initio is based on Many-Body Wavefunction Methods like Hartree-Fock (HF) and post-Hartree-Fock. Furthermore, this calculations used to identify the coordinates of a each electron and how each electron make interactions with each other and how they are correlated to each other. These methods are more complicated when comparing to DFT and highly computationally intensive.

3) Double-Zeta (cc-pVDZ) : Each orbital is represented by two sets of functions, description of electronic structure by considering more flexibility in the wavefunction and capturing some correlation effects but not all.

Triple-Zeta (cc-pVTZ) : This basis sets add a third set of functions for each orbital, further increasing the flexibility of the wavefunction and specially used for calculations requiring greater precision, such as geometry optimizations and vibrational frequency analyses.

Quadruple-Zeta (cc-pVQZ) : include four sets of functions per orbital, to enhance the accuracy and precision significantly. Furthermore, They are capable of capturing even subtle correlation effects[Capture post-Hartree-Fock methods. they involve both dynamic correlation (fluctuations in electron distribution) and static correlation (near-degeneracy effects)] but come with increased computational costs.

ANUGAMAGE commented 3 months ago

Hi @Sulstice , I ran TDDFT(Time dependent DFT) calculations for the unit cell of diamond and here are the results. I have used Matplotlib to get the uv excitation graph.

Code :

import numpy as np
import matplotlib.pyplot as plt
from pyscf import gto, scf, dft, tddft

# Set up the molecule
mol = gto.Mole()
mol.build(
    atom = 'C     0.0000      0.0000      0.0000; C     0 .8917     0.8917      0.8917; C     1.7834      1.7834      0.0000; C     2.6751      2.6751      0.8917; C     1.7834      0.0000      1.7834; C     2.6751      0.8917      2.6751; C     0.0000      1.7834      1.7834; C     0.8917      2.6751      2.6751 ',  # in Angstrom
    basis = 'crystal-cc-pVDZ',
    symmetry = True,
)

# Perform DFT calculation
mf = dft.RKS(mol)
mf.xc = 'b3lyp'
mf.kernel()

# Perform TDDFT calculation
mytd = tddft.TDDFT(mf)
mytd.kernel()

# Extract excited state energies (in eV)
excited_state_energies = np.array(mytd.e) * 27.2114  # Convert from Hartree to eV

# Plot the excited state energies
plt.figure(figsize=(8, 6))
plt.plot(range(1, len(excited_state_energies) + 1), excited_state_energies, 'bo-', markerfacecolor='red')
plt.xlabel('Excited State Number')
plt.ylabel('Energy (eV)')
plt.title('TDDFT Excited State Energies')
plt.grid(True)
plt.show()

Results :

converged SCF energy = -303.762002309501
TD-SCF states [0, 2] not converged.
Excited State energies (eV)
[0.0518773  0.27256567 0.36583442]

Screenshot from 2024-08-09 12-48-34

Results indicate that "TD-SCF states [0, 2] not converged". I think we need more computational power to run these calculations.

Sulstice commented 2 months ago

Plot things in kcal/mol. I don't understand eV If it couldn't converge it's usually a coordinate problem. Can you tell me how iterations it took before it gave that error message?

Convergence happens in steps so you can try to minimize over 100 iteractions and the convergence failure can come out at maybe lets say step 30. Depending on where it failed on the iteration cycle can be a clue.