Open schymane opened 5 years ago
Good idea. And you and I have exchanged on the need for some of the chemicals to be collapsed together too so the curation effort would affect those numbers. If you want me to do anything re looking for duplicates with mapping exercise let me know. I will dedicate a little time every day.
On that note we could add the number of compounds by unique InChIKeys and also the numbers by unique first block to collapse down the (stereo)isomers ... would be an interesting statistic to have.
Implemented with 50fb7caf073147b05ba297e90bac11429c3acd53 and rolled out on the dev server server. I added 3 numbers: Unique Spectra corresponds to the the total number of accessions, Unique Compounds is the count of unique InChI-keys and Unique Isomers is the count of unique first blocks of InChI-keys. I have not added a section of records without InChI-keys which is around 3000 atm. With some work it will come down to less than 900. This can be closed with the next rollout of the official MassBank server.
I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689
Can you revert that? The numbers I added in wikidata were the numbers with accession IDs AND InChIKeys (the data we provided), unique spectra contains several without InChIKeys ...
PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu
On Tue, Apr 30, 2019 at 10:29 AM +0200, "Egon Willighagen" notifications@github.com<mailto:notifications@github.com> wrote:
I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MassBank/MassBank-web/issues/175#issuecomment-487861487, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7BV7M24Q7R3ZN7TQJUENDPS77NDANCNFSM4HI33IOA.
Maybe we need to add the number of spectra with InChIKeys to the record index as well?
PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu
On Tue, Apr 30, 2019 at 11:16 AM +0200, "Emma SCHYMANSKI" emma.schymanski@uni.lu<mailto:emma.schymanski@uni.lu> wrote:
Can you revert that? The numbers I added in wikidata were the numbers with accession IDs AND InChIKeys (the data we provided), unique spectra contains several without InChIKeys ...
PI: EnvCheminf @ LCSB FNR ATTRACT Fellow emma.schymanski@uni.lu
On Tue, Apr 30, 2019 at 10:29 AM +0200, "Egon Willighagen" notifications@github.com<mailto:notifications@github.com> wrote:
I updated the entry in Wikidata: https://www.wikidata.org/wiki/Property:P6689
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MassBank/MassBank-web/issues/175#issuecomment-487861487, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA7BV7M24Q7R3ZN7TQJUENDPS77NDANCNFSM4HI33IOA.
So ... this appears only on the msbi.ipb-halle record index still it seems, but is there an issue with the numbers? More isomers than compounds? What do we mean with "isomer" vs "compound"? Can we name them more accurately? Unique Compounds (with stereoisomers) Unique Compounds (without stereosiomers) or Unique Compounds (same skeleton)? (better ideas welcome, I realise there is space limitation)
yep, should he solved next weekend. I am relactant to change the sever in the week because of the service availability. And we have still some issues with the deployment...
So ... this appears only on the msbi.ipb-halle record index still it seems
Yes, thats true but @tsufz is working on that issue. :+1:
, but is there an issue with the numbers? More isomers than compounds? What do we mean with "isomer" vs "compound"? Can we name them more accurately?
I implemented my understanding of the topic: Lets have two Spectra, one from L-Alanin and one from D-Alanin. Than we have one unique compound (Alanin) and two unique Isomers. Do you find this logic irritating? Should we name it differently? Should we count different things?
Well the problem is that isomers are defined on many different levels, and most would count a unique stereoisomer as a unique compound - hence the confusion. I would propose something like Compounds (with stereoisomers) and Compounds (without stereoisomers) to clarify more exactly what you mean. Most users will not really know the InChIKey first block assumption (although many do in the meantime).
Interesting ontological discussion :)
So, the IUPAC Goldbook does not have a definition of chemical compound
or compound
, but Wikipedia defines a compound
as follows: Chemical compounds have a unique and defined chemical structure held together in a defined spatial arrangement by chemical bonds.
The number of compounds in MassBank is not available anywhere ... we should have basic stats on how many compounds by unique InChIKeys and a number of records without InChIKeys (for instance). A total number of spectra would also be good (and the answer is not >186,000, see #174) :-) but can be calculated relatively easily by adding pos and neg numbers - this is not the case for compounds (e.g. adding by name - due to naming inconsistencies, and the number of letters/numbers there in the range...