WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
215 stars 77 forks source link

Current "related compounds" is ambiguous #2484

Open Adafede opened 2 months ago

Adafede commented 2 months ago

Is your feature request related to a problem? Please describe. Currently, the compounds listed as having the same connectivity encompass a broad range of different things including isotopomers

Describe the solution you'd like Either clarifying it or split them in subcategories. If we split them, I am happy to rewrite the respective queries using InChI and not InChIKey to strip the different respective layers. We could also make use of P3364 and P6185

Describe alternatives you've considered Letting things as they are right now (but removing the "including the compound itself (see https://github.com/WDscholia/scholia/commit/a52e3e7763d9922680255904b0a5dade7c724eed)

Additional context Trying to improve the chemical aspect

@egonw

egonw commented 2 months ago

I need to think about this a bit more. I like the idea, but need to overthink the implications.

Adafede commented 2 months ago

I also overthought again about it, and here is what came to my mind (WIP):

So keeping the same table but with an additional column, being "stereoisomer, isotopomer, etc." based on the matching layers:

PREFIX target: <http://www.wikidata.org/entity/Q41576>

# title: related chemical structures
SELECT ?mol ?molLabel ?InChI ?InChIKey ?CAS ?ChemSpider ?PubChem_CID ?layer_b ?layer_t ?layer_m ?layer_s WITH {
  SELECT ?queryKey ?srsearch ?filter WHERE {
    target: wdt:P235 ?queryKey .
    BIND(CONCAT(SUBSTR($queryKey,1,14), " haswbstatement:P235") AS ?srsearch)
    BIND(CONCAT("^", SUBSTR($queryKey,1,14)) AS ?filter)
  }
} AS %MOLS WITH {
  SELECT ?mol ?InChIKey WHERE {
    INCLUDE %MOLS
    SERVICE wikibase:mwapi {
        bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "Search";
        mwapi:srsearch ?srsearch;
        mwapi:srlimit "max".
        ?mol wikibase:apiOutputItem mwapi:title.
      }
    ?mol wdt:P235 ?InChIKey .
    FILTER (REGEX(STR(?InChIKey), ?filter))
    FILTER (?InChIKey != ?queryKey)
  }
} AS %MOLS2 {
  INCLUDE %MOLS2
  ?mol wdt:P234 ?InChI .
  # WIP
  BIND(REPLACE(?InChI, "/{0}.*?/b", "/") AS ?layer_b)
  BIND(REPLACE(?InChI, "/{0}.*?/t", "/") AS ?layer_t)
  BIND(REPLACE(?InChI, "/{0}.*?/m", "/") AS ?layer_m)
  BIND(REPLACE(?InChI, "/{0}.*?/s", "/") AS ?layer_s)
  OPTIONAL { ?mol wdt:P231 ?CAS }
  OPTIONAL { ?mol wdt:P661 ?ChemSpider }
  OPTIONAL { ?mol wdt:P662 ?PubChem_CID }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}