SimonEnsemble / PoreMatMod.jl

a find-and-replace tool for crystal structure models. implements (i) subgraph matching and (ii) point set alignment to search a parent crystal for a query fragment, then align and install a replacement fragment in its place.
MIT License
17 stars 2 forks source link

Replacement performed at first instance of functional group #149

Open jaharvey8 opened 1 year ago

jaharvey8 commented 1 year ago

When replacement is performed it is done using the first instance of a functional group found in the replacement moiety. As an example, if I want to replace a formate cap on a metal cluster with a new functional group, I can use the formate cap query moiety and flag the H atom. Then let's say I have a functional group with 2 COO groups, one of which should be protonated to maintain the correct charge upon replacement (e.g., BDC). If the COOH group is found first within the xyz file it will be the location at which replacement is done and you'll end up with a Zr-H-O(C)O-Zr group.

Is there a way to also indicate the functional group at which replacement is performed within the replacement moiety itself?

eahenle commented 1 year ago

Hi! Thanks for reporting (both here and via email to @SimonEnsemble)

Adding this cell to your notebook hacks around the problem:

let
    replacement = moiety("bdc-so3_deprotonated.xyz")
    replacement.atoms.species[8] = :C_
    child = substructure_replace(search, replacement, loc=[9, 18])
    replace!(child.atoms.species, :C_ => :C)
    view_structure.([replacement, child])
end

Altering the :C atom label for the protonated carboxyl carbon to :C_ makes it so that only the other carboxyl will be identified as a possible match for formate; transforming it back after the replacement is done gets you the structure you're after.

Unfortunately, you cannot just edit the atom label in the XYZ file, because the software doesn't have bonding rules for :C_. The bonding rules can be edited to include :C_ as either atom like so:

rc[:bonding_rules] = vcat(
    [
        BondingRule(:C_, rule.species_j, rule.max_dist) 
        for rule in rc[:bonding_rules]
        if rule.species_i ∈ [:C, :C!]
    ], 
    [
        BondingRule(rule.species_i, :C_, rule.max_dist) 
        for rule in rc[:bonding_rules]
        if rule.species_j ∈ [:C, :C!]
    ],
    rc[:bonding_rules]
)

Then, this works correctly with the 2nd carboxylate carbon tagged with _:

let
    child = substructure_replace(search, replacement, loc=[9, 18])
    replace!(child.atoms.species, :C_ => :C) # still need to un-tag the result
    view_structure.([replacement, child])
end

I think this is a workflow that should be officially supported in the API, so I will leave the issue open until I can patch this. It would probably be best to implement a way to select which fragment is matched without requiring any manual labeling, although this would likely be a breaking change. A PR for this would be more than welcome (no pressure!) Hopefully in the meantime these manual approaches will suffice for your purpose. Please let me know if this helps, and feel free to open further issues if you encounter any more bugs or missing features!