IUPAC / WFChemCookbook

The IUPAC WorldFAIR Cookbook for FAIR chemical data
https://iupac.github.io/WFChemCookbook
Creative Commons Attribution 4.0 International
13 stars 4 forks source link

Issue on page /recipes/opsin.html #13

Open jcuadros opened 1 year ago

jcuadros commented 1 year ago

Step 4, "Extract the formula of the substance", may be incorrect when the InChI includes a /p sublayer or when the InChI includes disconnected species (see examples below). It might be worth adding a comment stating this works for neutral covalent chemical species.

Citrate, https://pubchem.ncbi.nlm.nih.gov/compound/31348 InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10/h13H,1-2H2,(H,7,8)(H,9,10)(H,11,12)/p-3

Ethylammonium nitrate, https://pubchem.ncbi.nlm.nih.gov/compound/6432248 InChI=1S/C2H7N.NO3/c1-2-3;2-1(3)4/h2-3H2,1H3;/q;-1/p+1

Thanks!

stuchalk commented 1 year ago

Jordi, yes I agree that is a great idea. Thanks.

stuchalk commented 1 year ago

I just updated the page with a note about this (see commit 27a757f). Let me know if this appropriate and if not how should I amend it?

stuchalk commented 1 year ago

Can I close this issue?

jcuadros commented 1 year ago

I think (but it may be my English) that the second sentence is missing something. In any case, I would add a note (or something in the note) that states that the formula will be the corresponding to the neutral specie in the case the represented compound has protons either added or removed in the charge layer.

stuchalk commented 1 year ago

I just updated the note to make it better English (commits 342cc72 and f762c4f). Thanks for spotting that. As to you other point I believe I have addressed that in the second revision (f762c4f). Let me know what you think.

jcuadros commented 1 year ago

The first issue (salts) sounds great now. Thanks!

The second issue (charged species) still needs some work (IMO). The charge layer has two sublayers /q and /p which are used to specify different types of charged compounds.

When the /q sublayer is present, the formula does not show the charge but can be understood as correct. For example, InChI=1S/I3/c1-3-2/q-1 is the InChI for triiodide ion and InChI=1S/C2H5O/c1-2-3/h2H2,1H3/q-1 is ethanolate.

The problem comes when the /p sublayer is present. In this case, the formula does not correspond to the charged species. For example, the sulfate ion is InChI=1S/H2O4S/c1-5(2,3)4/h(H2,1,2,3,4)/p-2 The formula corresponds to sulfuric acid. The citrate ion is InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10/h13H,1-2H2,(H,7,8)(H,9,10)(H,11,12)/p-3 but its molecular formula is C6H5O7.

Some are still wilder as the dimercury(I) ion InChI=1S/2Hg/q2*+1 but I would skip those.

stuchalk commented 1 year ago

Ah, I get your point that I did not address that some species that have an InChI are charged and that is not covered currently in the text. Let me add that also.

stuchalk commented 4 months ago

@jcuadros Are there still things that need to be fixed here, or can I close this issue?