MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

Addition of massbank records in folder AU #33

Closed nalygizakis closed 5 years ago

nalygizakis commented 5 years ago

Hi Tobias,

I forked the repository, added the new MassBank records and I created a pull request to merge the directories.

Kind Regards, Nikiforos

meier-rene commented 5 years ago

Hi Nikiforos, thank you for the pull request. I would like to help you getting these new records into MassBank. I will check the validation results and come back with some suggestions.

nalygizakis commented 5 years ago

Thanks! Please, go ahead and let me know if there is any validation problem.

meier-rene commented 5 years ago

I had a look into your submission and I found a number of issues which I can not solve without your help. I could probably solve a lot of this on my own, but I would feel more comfortable if you at least know it and approve these things. Even better would be if you try to reprocess them all with a recent RMassBank e.g. from Bioconductor 3.8, because this would solve a number of issues automatically.

Issue 1: As far as I understand a lot of the new records were created with a normal phase chromatography and you labeled them LC-HILIC but in MassBank we only have LC, GC and and CE-MS. These records would need a relabeling to LC-ESI-QTOF. Issue 2: The ACCESSION in the files do not match the filenames. I could set the ACCESSION to the filename. Setting the filename to the ACCESSION is not possible, because the filenames are already occupied .

There are more minor issues, but before we discuss this I would like to know if you prefer to solve Issue 1 & 2 on your own with a new RMassBank. If you think you can not manage this I need advice for Issue 2.

nalygizakis commented 5 years ago

Hi! Good news is that issues are clear. I will correct both issues myself.

I have a question for you; I understood that AC$INSTRUMENT_TYPE should be LC-ESI-QTOF. However, is it possible that RECORD_TITLE is "Compound name; LC-HILIC-ESI-QTOF; ..."?

I renamed the records, so that they do not conflict with RP records. This is why you have ACCESSION conflict.

schymane commented 5 years ago

You can modify the Record Title pattern in the settings file … I would suggest you add HILIC after the actual instrument code tho … e.g. … LC -ESI-QTOF; HILIC; ….

nalygizakis commented 5 years ago

Hi Emma. Let's take an example so that it is clear to me. RECORD_TITLE: 1H-Benzotriazole; LC-HILIC-ESI-QTOF; MS2; CE: 20 eV; R=35000; [M+H]+ is suggested to be changed to RECORD_TITLE: 1H-Benzotriazole; LC-ESI-QTOF; HILIC; MS2; CE: 20 eV; R=35000; [M+H]+

Do you agree with the example above?

schymane commented 5 years ago

Looks fine to me – it will get you HILIC in the title but won’t interfere with the automatic fields. You may need to check with the Record Specification but I do not see anything to prevent this (we e.g. added Resolution into the title), as far as I recall they have a minimum information they want in the title but do not prevent you adding more. Pls correct me if I am wrong @meier-rene as I’ve not had a chance to check for sure.

meier-rene commented 5 years ago

Any chance to put this a little bit more to the end of the line? Would make things more easy in MassBank code. The first three fields are validated atm. Could you agree to RECORD_TITLE: 1H-Benzotriazole; LC-ESI-QTOF; MS2; HILIC; CE: 20 eV; R=35000; [M+H]+ ?

nalygizakis commented 5 years ago

It is feasible to follow this set up. Therefore, we cancel the pull request and I will fork again, upload the new records and try to pull again. Do you agree @meier-rene?

meier-rene commented 5 years ago

Would be great.

meier-rene commented 5 years ago

Waiting for new try...