MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

MassBank Record Format / MassBank data schema 2.10 #17

Closed tsufz closed 8 years ago

tsufz commented 8 years ago

Together with innovations in the database model, we may publish a new record format in order to make map the new database structure.

schymane commented 8 years ago

What database innovations? What changes to the record format are you proposing? Changes to current fields could be inconvinient for many, additions maybe less so...? What is the time frame?

tsufz commented 8 years ago

No, not a new schema, but update with new fields (eg, SPLASH, InChIKey) in order to map them properly. Not high priority, but necessary at a curtain time.

sneumann commented 8 years ago

So that sounds more like an 2.1 update, as it covers additions, and no incompatible changes.

takaakin commented 8 years ago

Before I will add new terms to the massbank record definition, please allow me to make the function of InChIkey clear for the mass spectometry users. In the current records, the key is defined as a chemical linker, CH$LINK: INCHIKEY. But the key might be more useful for searching the mass spectra of chemical compounds, for an example, that have the same atome connectivity but different stereochemistries. So my comment is how is InChIkey defined as CH$INCHIKEY rather than CH$LINK: INCHIKEY? But this change would provide me a lot of work because most of the MassBank.jp records have no InChI key term yet.

m-arita commented 8 years ago

Hi,

So my comment is how is InChIkey defined as CH$INCHIKEY rather than CH$LINK: INCHIKEY?

The definition of InChI includes the version and the connectivity without stereo. So, as long as the full key is included, you can perform comparison without stereo (first 14 letters) or with stereo (full key).

The drawback of the current MassBank format is the design "required" fields. Since molecular information is required, we cannot register spectra of semi-identification. The treatment of MS/MS is also a problem. My suggestion is:

I agree to the opinion that Norman-mb and MassBank.jp can label separate ID because we now have SPLASH to pierce through all.

Best wishes,

Masanori Arita

2016-02-25 11:02 GMT+09:00 Takaaki Nishioka notifications@github.com:

Before I will add new terms to the massbank record definition, please allow me to make the function of InChIkey clear for the mass spectometry users. In the current records, the key is defined as a chemical linker, CH$LINK: INCHIKEY. But the key might be more useful for searching the mass spectra of chemical compounds, for an example, that have the same atome connectivity but different stereochemistries. So my comment is how is InChIkey defined as CH$INCHIKEY rather than CH$LINK: INCHIKEY? But this change would provide me a lot of work because most of the MassBank.jp records have no InChI key term yet.

— Reply to this email directly or view it on GitHub https://github.com/MassBank/MassBank-web/issues/17#issuecomment-188560156 .

Masanori Arita (arita@nig.ac.jp) National Institute of Genetics Yata 1111, Mishima City, 411-8540 Shizuoka, Japan Tel: +81-(0)-55-981-9449

schymane commented 8 years ago

I don't think changing the field name for the InChIKey is necessary, as people know where it is now and can use it as they wish. We already have 'semi' identifications in MassBank.EU, one can circumvent the required fields by adding 'NA' and the importer will accept this. We have built this into RMassBank already to get consistent comment fields too. I don't understand the issue with MS and MS/MS, because these can also be labelled in the current definition? For the existing records, it may be possible to add InChIKeys, we have code that does that in the background for the summaries we sent to ChemSpider. The impression I had was that some records required a lot more curation than just adding an InChIKey, then it becomes an issue of who should do it and what priority this is? I would much prefer a 2.1 type definition upgrade than a whole new definition for the current MassBank we deal with.


From: Masanori Arita [notifications@github.com] Sent: Thursday, 25 February 2016 3:46 AM To: MassBank/MassBank-web Cc: Schymanski, Emma Subject: Re: [MassBank-web] MassBank Record Format / MassBank data schema 3.0 (#17)

Hi,

So my comment is how is InChIkey defined as CH$INCHIKEY rather than CH$LINK: INCHIKEY?

The definition of InChI includes the version and the connectivity without stereo. So, as long as the full key is included, you can perform comparison without stereo (first 14 letters) or with stereo (full key).

The drawback of the current MassBank format is the design "required" fields. Since molecular information is required, we cannot register spectra of semi-identification. The treatment of MS/MS is also a problem. My suggestion is:

I agree to the opinion that Norman-mb and MassBank.jp can label separate ID because we now have SPLASH to pierce through all.

Best wishes,

Masanori Arita

2016-02-25 11:02 GMT+09:00 Takaaki Nishioka notifications@github.com:

Before I will add new terms to the massbank record definition, please allow me to make the function of InChIkey clear for the mass spectometry users. In the current records, the key is defined as a chemical linker, CH$LINK: INCHIKEY. But the key might be more useful for searching the mass spectra of chemical compounds, for an example, that have the same atome connectivity but different stereochemistries. So my comment is how is InChIkey defined as CH$INCHIKEY rather than CH$LINK: INCHIKEY? But this change would provide me a lot of work because most of the MassBank.jp records have no InChI key term yet.

� Reply to this email directly or view it on GitHub https://github.com/MassBank/MassBank-web/issues/17#issuecomment-188560156 .

Masanori Arita (arita@nig.ac.jp) National Institute of Genetics Yata 1111, Mishima City, 411-8540 Shizuoka, Japan Tel: +81-(0)-55-981-9449

� Reply to this email directly or view it on GitHubhttps://github.com/MassBank/MassBank-web/issues/17#issuecomment-188574516.

tsufz commented 8 years ago

I agree with Emma. I would touch the general format, but map the newly introduced field to the specification. I met some vendors a while ago. They really appreciated the well written MassBank Record format. Thus, no need to change the whole model, sorry for the misunderstanding.

As Emma said, we can process the records using the same R script we did use for the splash annotation to add missing values in the older records. However, I don't know if the providers agree with curation of their records. In case of meta data maybe yes. In case of mass spectral and related information maybe not.

meowcat commented 8 years ago

The treatment of MS/MS is also a problem.

Can you explain?

My suggestion is:

  • Separate spectral types (MS/MS or MS) by ID.

Can you explain? If you are talking about the accession ID, I think imposing even more rules on how to build it is not ideal. MS and MS/MS are already separated by means of the AC$MASS_SPECTROMETRY: MS_TYPE tag, or do I misunderstand?

I would favor additions over changes in general. I don't think there is much wrong with the current fields as is. Removing "requirements" is not a problem, I think, and would maybe be a good idea.


My most important suggestion would be (as I mentioned in other places) that we need more space in the Accession ID. --> #11 Accession code restrictions****

takaakin commented 8 years ago

I have updated the MassBank Record Format from 2.09 to 2.10.MassBankRecord_en.pdf It will be open to the public from the MassBank Web page. The new version includes two major updates.

  1. The default Creative Commons License of MassBank is defined as CC BY.
  2. Two new mandatory tags are added; CH$LINK: INCHIKEY and PK$SPLASH.
tsufz commented 8 years ago

Dear Takaaki-san, Thanks a lot! I suggest to set INCHIKEY to mandatory for known and tentative structures. If we have unknowns, we cannot set the structure information. We are able to process such records with RMassBank now. However, the smiles and inchikey are set to N/A to cheat the validation procedure (which is implemented for smiles, yet).

uchem-massbank commented 8 years ago

InChIKey should not be mandatory for all kinds of tentative structures – not all have a fully defined structure and these do not have InChIKeys … Thus, we can only require InChIKeys for known structures and leave it optional for the rest.

takaakin commented 8 years ago

I revised the document. INCHIKEY is optional. New version will be published from MassBank web page. MassBankRecordFormat_en (v 2.10).pdf

tsufz commented 8 years ago

Closed upon new updates