MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

Update record format #203

Closed tsufz closed 4 years ago

tsufz commented 4 years ago

Updata records format and bump to version 2.4.

schymane commented 4 years ago

I received an email earlier today with additional field suggestions. They want to add precursor intensity and concentration of the chemical. Do you want to hold off with revising until they have replied? For the record I suggested AC$CONCENTRATION and MS$FOCUSED_ION: PRECURSOR_INT. I mentioned we were revising it like right now ...

tsufz commented 4 years ago

In meanwhile some comments came in...

tsufz commented 4 years ago

@schymane, do you know when they will details on their invention? I would like to release 2.4 soon to get it from the table and to start the work with implementation.

schymane commented 4 years ago

They wanted concentration information and precursor intensity in the records and my suggestions would be to add new tags: AC$CONCENTRATION MS$FOCUSED_ION: PRECURSOR_INT

I don't know if we have to wait for more details - if you want to slightly modify my suggestions let's discuss ... I have not had a response...and think we should finalise ...

tsufz commented 4 years ago

Well, I will put it in with any description and then finalise 2.4.

tsufz commented 4 years ago

I think AC$CONCENTRATION does not really fit to any of the existing analytical sections. I will create a new general section for comment tags...

meier-rene commented 4 years ago

Would "AC$CHROMATOGRAPHY CONCENTRATION" fit?

schymane commented 4 years ago

I toyed with a few ideas and kept landing on AC$CONCENTRATION

We do not always have chromatography (direct infusion?) MS$FOCUSED_ION CONCENTRATION is also an alternative but certainly not ideal because this has already been ionised. CH$ also inappropriate as it's the chemical information not the concentration (which is an analytical condition ... hence AC is the proper home in my opinion...)

tsufz commented 4 years ago

I added the AC$GENERAL section 2.4.7 for such purposes.

sneumann commented 4 years ago

Hm. In all publications with analytical chemistry I totally support reporting concentrations. In our case, what's the difference in a MS/MS spectrum of 0.001 mol/l caffeine solution vs. an MS/MS of 10.0 mol/l caffeine ? What was the intended use case ? Ionisation efficiency ? I am hesitant to add too much detail. Similarly, what is the intention behind precursor intensity ? The Grant lab at UConn looked into that (Ecom50). Would we have enough information if we did target those use cases ? I suggest that if we add information, we also give examples in the docs what this can be actually used for. Yours, Steffen

schymane commented 4 years ago

The email contact wanted indeed to use this data to predict ionisation efficiency based on information such as concentration and precursor intensity. All other info needed is already available (if provided) in the record format. I understand the unwillingness to add too much detail and these should certainly not be compulsory fields, however this is information that people may have. I am not sure how many people will actually add this information and I would be hesitant to add it to RMB esp with our lack of active programmers on it right now, but I see no harm to add it to the record format to standardize it if people wish to add it (otherwise it will e.g. end up in the COMMENT field...)?

tsufz commented 4 years ago

I agree with @sneumann that the concentration is not demanding. I will use such concentration that I am able to trigger an MS2.

Actually, there is still lots of undocumented stuff in the records. I tried to minimise and I will write a list of curation issues for @meier-rene once the new Record Format is approved. However, I ignored some of very fancy parameters included in only some few records. In parts, this is historical from very old records, but we have also some newer submissions which do not follow the record format.

We should work on a vocabulary to improve automated data curation and to standardize. In future, records which do not comply should be rejected.

I agree also with @schymane that there is no harm to add it to to format, but the people should not expect to much submission including such information.

I suggest to write issues for the implementation in RMassBank, some new things are related to just hard coding in the settings and the records workflow. Intensity needs extraction, handling and finally export. The former is long hanging, the latter needs more time to be implemented.