QChASM / SEQCROW

Plug-in for ChimeraX providing features for building and manipulating organic and organometallic molecules as well as displaying output from quantum chemistry computations.
GNU General Public License v3.0
22 stars 6 forks source link

NMR Spectrum / ORCA 6.x .out file peak arithmetic looks way off #20

Open mkedwards opened 5 days ago

mkedwards commented 5 days ago

Describe the bug File Info sees correct chemical shifts from ORCA 6.0.0 output file. Reference-adjusted data displayed in NMR Spectrum plot appears to have been scaled nonsensically.

To Reproduce Steps to reproduce the behavior:

  1. Open .out file from ORCA NMR run containing chemical shifts
  2. Select menu item "Tools/SEQCROW/File Info"
  3. Inspect NMR shifts data, confirm that .out file has been read in properly
  4. Select menu item "Tools/Quantum Chemistry/NMR Spectrum"
  5. Go to "shift scaling" tab, check that alpha is -1, set delta_ref appropriately (for me, 31.613 based on TMS modeling)
  6. Go to "plot" tab, press home button
  7. Observe chemical shifts wildly out of range

Expected behavior Display accurate 1-D plot of (delta_reference - computed chemical shift).

Screenshots

image image

Structure: The .out file is available upon request, but I don't think it's anything special.

Additional context ChimeraX 1.8, SEQCROW 1.8.14

mkedwards commented 5 days ago

I don't know why this has six labels; I don't think I added any of them. It's a bug, unless I'm doing something wrong.

ajs99778 commented 5 days ago

could you send the .out file? I'd like to take a closer look at this case

mkedwards commented 5 days ago

theanine-nmr.tgz

That tarball has the .inp, .xyz, and .out files in it (not in a directory; sorry).

ajs99778 commented 5 days ago

Thanks for sending the .out file. I've taken a quick look, and this is an issue where our NMR data parser doesn't know how to handle post-HF methods. The NMR shielding data is getting printed for the SCF, and MP2 twice (unrelaxed and relaxed density). The parser is reading all of these,, and they all get combined, but then only divided by the number of equivalent nuclei when averaging the shifts. The result is all shifts are about 3x more than they should be. I'll get working on a fix for that.

mkedwards commented 5 days ago

Can I help?

ajs99778 commented 5 days ago

I should be able to fix this pretty quickly. There are a few other things I want to work on this evening, but once those are done I'll get started. Would you want to be able to see the NMR data for each of the three different methods, or just the last one in the file (MP2 with relaxed density)? It would be pretty easy to parse that data, but adding UI options to choose which one might take a bit more time.

mkedwards commented 5 days ago

Probably just need to insert a bit of code here: https://github.com/QChASM/AaronTools.py/blob/master/spectra.py#L2185 that drops the array contents every time you hit a CHEMICAL SHIELDINGS line. It's the last dataset (Relaxed density) that matters.

mkedwards commented 5 days ago

Incidentally, I love SEQCROW; it's the only tool out there that does what I need done. I'm excited to see the splittings it generates given the J-coupling data I'm computing now.

mkedwards commented 5 days ago

This appears to work:

--- spectra.py.orig 2024-10-29 20:33:24
+++ spectra.py  2024-11-07 20:56:02
@@ -2170,6 +2170,7 @@
         super().__init__(*args, **kwargs)

     def parse_orca_lines(self, lines, *args, **kwargs):
+        shifts = []
         nuc = []
         element = None
         ndx = None
@@ -2177,11 +2178,14 @@
         ndx_b = None
         ele_a = None
         ele_a = None
+        reset_regex = re.compile("CHEMICAL SHIELDINGS")
         nuc_regex = re.compile("Nucleus\s*(\d+)([A-Za-z]+)")
         coupl_regex = re.compile(
             "NUCLEUS A =\s*([A-Za-z]+)\s*(\d+)\s*NUCLEUS B =\s*([A-Za-z]+)\s*(\d+)"
         )
         for line in lines:
+            if reset_regex.search(line):
+                shifts = []
             if nuc_regex.search(line):
                 # could use walrus
                 shift_info = nuc_regex.search(line)
@@ -2190,7 +2194,7 @@
             elif line.startswith(" Total") and "iso=" in line:
                 if ndx_a is None:
                     shift = float(line.split()[-1])
-                    self.data.append(Shift(shift, ndx=ndx, element=element, intensity=1))
+                    shifts.append(Shift(shift, ndx=ndx, element=element, intensity=1))
                 else:
                     # orca 5 coupling
                     coupling = float(line.split()[-1])
@@ -2212,6 +2216,7 @@
                 coupling = float(line.split()[-1])
                 self.coupling[ndx_a][ndx_b] = coupling
                 self.coupling[ndx_b][ndx_a] = coupling
+        self.data.extend(shifts)

     def parse_gaussian_lines(self, lines, *args, **kwargs):
         nuc = []
ajs99778 commented 5 days ago

I appreciate the kind words. I fixed this in fileIO before I noticed this, but thanks for the suggestion. The new version on the ChimeraX toolshed should work. Let me know if you have any issues with the new version.

mkedwards commented 3 days ago

Looks great. Thank you!