ClapeyronThermo / GCIdentifier.jl

tools to perform group contribution (GC) identification, given the SMILES of a compound
MIT License
14 stars 2 forks source link

About ERROR: Could not find all groups #12

Closed felipe-mansoldo closed 3 months ago

felipe-mansoldo commented 3 months ago

Dear all,

I'm trying your package. I decided to start with a famous molecule, Penicillin G. So I got its SMILES at: https://pubchem.ncbi.nlm.nih.gov/compound/Penicillin-G#section=Canonical-SMILES

Canonical SMILES = CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C

But I got error:

julia> (component,groups) = get_groups_from_smiles("CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C", UNIFACGroups)
ERROR: Could not find all groups for CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] get_groups_from_smiles(smiles::String, groups::Vector{GCPair}; connectivity::Bool, check::Bool)
   @ GCIdentifier C:\Users\admin\.julia\packages\GCIdentifier\V6P7e\src\group_search.jl:126
 [3] get_groups_from_smiles(smiles::String, groups::Vector{GCPair})
   @ GCIdentifier C:\Users\admin\.julia\packages\GCIdentifier\V6P7e\src\group_search.jl:111
 [4] top-level scope
   @ REPL[3]:1

Is it a bug or some limitation?

thanks,

pw0908 commented 3 months ago

Hi Felipe,

This isn't really an error. It just means that UNIFAC doesnt have all the groups necessary to model Penicilin G. unfortunately... You can see which groups UNIFAC does have by adding check=false:

(component,groups) = get_groups_from_smiles("CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C", UNIFACGroups; check=false)
("CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C", ["COOH" => 1, "CH3" => 2, "ACH" => 5, "CY-CH" => 2, "CY-C" => 2, "CH2CO" => 1, "CHNH" => 1, "AC" => 1])

To see which groups are missing, you can use our find_missing_groups_from_smiles function:

groups = find_missing_groups_from_smiles("CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C", UNIFACGroups)
3-element Vector{GCPair}:
 GCPair("[NX3;H0;R]", "cN")
 GCPair("[SX2;H0;R]", "cS")
 GCPair("[OX1;H0;!R]", "O=")

Indeed, it looks like the cyclic sulfur and cyclic nitrogen groups haven't been fitted. The ketone oxygen is mainly due to the fact that there is a cyclic carbon which is double bonded to that oxygen.

Hope this helps!

Best regards,

Pierre

felipe-mansoldo commented 3 months ago

Hi Pierre, Thank you for your attention and response. Do you think about expanding support to other groups? My best regards,

pw0908 commented 3 months ago

Hi Felipe,

We aren't the developers of UNIFAC so we don't really have much motivation / interest to expand the approach to other groups. This is something you could do yourself, however, using Clapeyron.

Feel free to raise an issue there if you'd like some guidance.

Best regards,

Pierre

felipe-mansoldo commented 3 months ago

Hi, ok, thank you very much! my best,