MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

GC-APCI-QTOF spectra to MassBank #97

Closed nalygizakis closed 3 years ago

nalygizakis commented 5 years ago

Hi @meier-rene @tsufz,

There is a project in NORMAN joint program of activities to upload GC-APCI-QTOF mass spectra in MassBank. I prepared for you one massbank record (https://www.dropbox.com/s/93z76o9bx243lll/AU230117.txt?dl=0), so that you apply the needed modifications to MassBank (if any).

Let me know if all is okay with the sample record, so that I give the signal for production of GC-APCI-QTOF records.

Thanks! Nikiforos

tsufz commented 5 years ago

HI @nalygizakis, Thanks for the initiative. I suggest to invent a complete new scheme for the GC based records. I will work on the record format and come back to you.

tsufz commented 5 years ago

Actually, we have already some examples of GC records such as https://massbank.eu/MassBank/RecordDisplay.jsp?id=MSJ01035&dsn=MSSJ.

@meier-rene, I guess all undocumented tages in MB records are provided in the undocumented subtag sections in https://github.com/MassBank/MassBank-web/blob/master/Documentation/MassBankRecordFormat.md?

tsufz commented 4 years ago

@nalygizakis, I have now updated the record format and here are my suggestions to improve your record to be compliant with the record format version 2.4 (not published yet).

You can download my comments here: AU230117_TS.docx

The draft record format 2.4 is found here: https://github.com/tsufz/MassBank-web/blob/update_record_format/Documentation/MassBankRecordFormat.md

tsufz commented 4 years ago

AU230117_TS.docx

Another Comment!

nalygizakis commented 4 years ago

@tsufz I agree with the comments and improvements. I see some new fields in the record such as "SOURCE TEMPERATURE" or "TRANSFERLINE TEMPERATURE". Is there any chance to apply and integrate these improvements in RMassBank package? Of course, I can generate the records and add these details with a small script by myself. Let me know.

tsufz commented 4 years ago

@nalygizakis By now, you need to infuse the new fields into the records by own code.

@schymane and @meowcat, we should keep in mind to integrate new tags into RMassBank. The Records Format is quite advanced now. I don't see many changes in future and thus IMHO it makes sense to update.

schymane commented 4 years ago

I think we will need to think about some functionality to deal with optional entries in the settings file. Some of the new changes are very specialised - some can be hard coded (i.e. set in the settings ini) but others will require data extraction (the full scan intensity and e.g. concentration, which would have to be provided up front in the compound list). Code changes non-trivial and capacities not clear?

tsufz commented 4 years ago

Yes, we require code changes for hard coding. R nerds could infuse tags, but this is also tedious and error-prone. We should start with the low hanging stuff such as hard coding of new tags and postpone the data extraction topic. @sneumann mentioned some while ago that it would be nice to get more data from the mzML. I suggest to open an issue in RMassBank to collect ideas and solutions for the automated retrieval and hope that it will be done anytime...

tsufz commented 4 years ago

.., with respect to the new tags, I will add issues for the implementation in RMassBank each by each. I think, the project tag is also still not in....

schymane commented 4 years ago

For some of these it will really probably be most practical to have some scripts that one can run post record generation to add extra fields, since they are so specific - at least to start.

meowcat commented 4 years ago

Just as a heads up, I believe the long-term solution for RMassBank is migration of the record rendering step to what MSnio should become, where the record format can be simply updated in a template rather than in the code. The mid-term solution is to adopt the changes from the S4power branch, where any additional record info can just be added into records directly into the @info slot.

But I understand that promises of great changes yet to come are not so helpful here, and I myself am frustrated at not having time to move this forward.

tsufz commented 4 years ago

I submitted a JPA for NORMAN to enhance vocabulary based representations of the NORMAN databases to overcome the drawbacks of for example DCTs. A MassBank ontology to map the MassBank record format could be a first step for implementation of a more dynamic rendering of the records.

tsufz commented 3 years ago

I think this was solved in e8a5d974418af23d0dbc0e6720b9eb3c78039184