OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Primary Google indexing of OSM structures via InChIKey #289

Closed cdsouthan closed 8 years ago

cdsouthan commented 9 years ago

I note that while this page below convverts well in chemicalize.org there are no InChIKeys

http://openwetware.org/wiki/OpenSourceMalaria:Triazolopyrazine_%28TP%29_Series#Strings_for_Google

As you know these the Keys the most effective way for your structures to become Google findable so its a good idea to add them. Otherwise you have the odd situation that leads like MMV670437 PMIWBIXSAYKRGF-SFHVURJKSA-N can be found but not on the OSM site

mattodd commented 9 years ago

There are Keys on the page, but not yet for every compound. Some ought to be captured..? On 28 Mar 2015 09:48, "cdsouthan" notifications@github.com wrote:

I note that while this page below convverts well in chemicalize.org there are no InChIKeys

http://openwetware.org/wiki/OpenSourceMalaria:Triazolopyrazine_%28TP%29_Series#Strings_for_Google

As you know these the Keys the most effective way for your structures to become Google findable so its a good idea to add them. Otherwise you have the odd situation that leads like MMV670437 PMIWBIXSAYKRGF-SFHVURJKSA-N can be found but not on the OSM site

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/289.

drc007 commented 9 years ago

If you look at the source code it is a HTML page which is great for viewing in a web browser but I wonder if the html tags are messing up interpretation?

cdsouthan commented 9 years ago

Chris/Matt, there are two related issues here. As you can test, chemicalize.org works fine for IUPAC, SMILES or InChI strings on (any) html pages. But AWK it won't "convert" InChIKeys (it could do a look-up for ones it has, but thats not configured). As you can see (image attached) Google indexes the InChIKey just fine in any OSM instanciation. This solves the "findability" problem but only as exact (or inner layer) matches. Consequent to my encouragment, ChemAxon actually deposited their 0.3 milion chemicalize.org structure conversion cache in PubChem in 2012 https://www.ncbi.nlm.nih.gov/pccompound?term=%22chemicalize.org%20by%20ChemAxon%22[SourceName]&cmd=DetailsSearch The big advantage (potentialy) for OSM, would be a quicker route for getting all structures to > PubChem (simply via chemicalization of the web pages) which them become globaly findable (in PubChem) by similarity as well as just exact match. However, ChemAxon did not prioritised the updates, so you can only "find" newer ones (i.e. from 2013 onwards) if you exucute a similarity search on chemicalize.org in situ (i.e. against their updated local cache of structures)

capture

mattodd commented 9 years ago

Just wanting to either progress or close this. We continue to use Keys, correct, but is there some problem with including them as text on a wiki? i.e. is this a problem our end or specifically with Chemicalise? Is this issue critical (actively introducing confusion) or rather "would like to have"?

mattodd commented 8 years ago

Since data now being added to Google Master sheet, I'm closing this and we can re-engage with the newer ways in which we are making the strings public.