MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

[SOLVED] Add number of compounds to Record Index page #175

Open schymane opened 5 years ago

schymane commented 5 years ago

The number of compounds in MassBank is not available anywhere ... we should have basic stats on how many compounds by unique InChIKeys and a number of records without InChIKeys (for instance). A total number of spectra would also be good (and the answer is not >186,000, see #174) :-) but can be calculated relatively easily by adding pos and neg numbers - this is not the case for compounds (e.g. adding by name - due to naming inconsistencies, and the number of letters/numbers there in the range...

egonw commented 5 years ago

Indeed, useful as "Reference URL" here:

image

ChemConnector commented 5 years ago

Good idea. And you and I have exchanged on the need for some of the chemicals to be collapsed together too so the curation effort would affect those numbers. If you want me to do anything re looking for duplicates with mapping exercise let me know. I will dedicate a little time every day.

schymane commented 5 years ago

On that note we could add the number of compounds by unique InChIKeys and also the numbers by unique first block to collapse down the (stereo)isomers ... would be an interesting statistic to have.

meier-rene commented 5 years ago

Implemented with 50fb7caf073147b05ba297e90bac11429c3acd53 and rolled out on the dev server server. I added 3 numbers: Unique Spectra corresponds to the the total number of accessions, Unique Compounds is the count of unique InChI-keys and Unique Isomers is the count of unique first blocks of InChI-keys. I have not added a section of records without InChI-keys which is around 3000 atm. With some work it will come down to less than 900. This can be closed with the next rollout of the official MassBank server.

egonw commented 5 years ago

I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689

schymane commented 5 years ago

Can you revert that? The numbers I added in wikidata were the numbers with accession IDs AND InChIKeys (the data we provided), unique spectra contains several without InChIKeys ...


PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu

On Tue, Apr 30, 2019 at 10:29 AM +0200, "Egon Willighagen" notifications@github.com<mailto:notifications@github.com> wrote:

I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MassBank/MassBank-web/issues/175#issuecomment-487861487, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7BV7M24Q7R3ZN7TQJUENDPS77NDANCNFSM4HI33IOA.

schymane commented 5 years ago

Maybe we need to add the number of spectra with InChIKeys to the record index as well?


PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu

On Tue, Apr 30, 2019 at 11:16 AM +0200, "Emma SCHYMANSKI" emma.schymanski@uni.lu<mailto:emma.schymanski@uni.lu> wrote:

Can you revert that? The numbers I added in wikidata were the numbers with accession IDs AND InChIKeys (the data we provided), unique spectra contains several without InChIKeys ...


PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu

On Tue, Apr 30, 2019 at 10:29 AM +0200, "Egon Willighagen" notifications@github.com<mailto:notifications@github.com> wrote:

I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MassBank/MassBank-web/issues/175#issuecomment-487861487, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7BV7M24Q7R3ZN7TQJUENDPS77NDANCNFSM4HI33IOA.

schymane commented 4 years ago

So ... this appears only on the msbi.ipb-halle record index still it seems, but is there an issue with the numbers? More isomers than compounds? What do we mean with "isomer" vs "compound"? Can we name them more accurately? Unique Compounds (with stereoisomers) Unique Compounds (without stereosiomers) or Unique Compounds (same skeleton)? (better ideas welcome, I realise there is space limitation)

https://msbi.ipb-halle.de/MassBank/RecordIndex

image

tsufz commented 4 years ago

yep, should he solved next weekend. I am relactant to change the sever in the week because of the service availability. And we have still some issues with the deployment...

meier-rene commented 4 years ago

So ... this appears only on the msbi.ipb-halle record index still it seems

Yes, thats true but @tsufz is working on that issue. :+1:

, but is there an issue with the numbers? More isomers than compounds? What do we mean with "isomer" vs "compound"? Can we name them more accurately?

I implemented my understanding of the topic: Lets have two Spectra, one from L-Alanin and one from D-Alanin. Than we have one unique compound (Alanin) and two unique Isomers. Do you find this logic irritating? Should we name it differently? Should we count different things?

schymane commented 4 years ago

Well the problem is that isomers are defined on many different levels, and most would count a unique stereoisomer as a unique compound - hence the confusion. I would propose something like Compounds (with stereoisomers) and Compounds (without stereoisomers) to clarify more exactly what you mean. Most users will not really know the InChIKey first block assumption (although many do in the meantime).

egonw commented 4 years ago

Interesting ontological discussion :)

So, the IUPAC Goldbook does not have a definition of chemical compound or compound, but Wikipedia defines a compound as follows: Chemical compounds have a unique and defined chemical structure held together in a defined spatial arrangement by chemical bonds.