MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

New try to submit ACES / SU MS2 spectrum #200

Closed BenildeB closed 2 years ago

BenildeB commented 2 years ago

Hello,

Here is a new try. If this one is not working, I don't know what I did wrong. I hope this one will work. Thanks.

Best, Bénilde

tsufz commented 2 years ago

Hi @BenildeB, Thank you so much for the submission of your records! The submitted branch is now inline with the MassBank-data:dev branch and thus could be merged theoretically. However, the validator module did not run, and thus we cannot check the compliance of your records.

@meier-rene, now is your turn to look for the reason why the travis-workflow did not perform?

Best Tobias

meier-rene commented 2 years ago

Thanks for your work @BenildeB and @tsufz. Our validation has stopped, because we are out of credits at Travis. Unfortunately they make it so complicated to get free credits for OSS projects, that I will move our validation to Github Actions. For now I can validate things manually at my local computer.

meier-rene commented 2 years ago

Hi @BenildeB , I had a look at your data. There was a number of issues, which I could resolve. But there are three problems where your inspection is required. This is a wrap up of the validator messages:

Incorrect number of peaks in peaklist. 4 peaks are declared in PK$NUM_PEAK line, but 5 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000254.txt'.

Incorrect number of peaks in peaklist. 30 peaks are declared in PK$NUM_PEAK line, but 7 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000182.txt'.

Incorrect number of peaks in peaklist. 13 peaks are declared in PK$NUM_PEAK line, but 7 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000151.txt

Its impossible for me to resolve this kind of inconsistencies, because only you have the primary data. Please check these three peaklists and give me a feedback.

meier-rene commented 2 years ago

@BenildeB my second comment is about the issues I could fix. And I also have a questions. I have noticed, that you did your data processing with MS-DIAL and I noticed, that the issues I found are consistent for the whole submission. I expect that you have some sort of pipeline to process that MS-DIAL data to produce the files you contributed. Are you willing to provide that pipeline? It might be interesting for other people as well. I want to help polishing that pipeline to fix the remaining issues. Please think about it. The issues I Identified are:

All these issues should be easy to fix in the pipeline. What do you think?

BenildeB commented 2 years ago

@meier-rene

Thank you for your feedback.

Considering all of this, should I resubmit only the files that had problems (AS000151/182/254), or do the modifications on all of them? We would like to submit these MS2 spectrum before submitting a paper for publication, and the modifications you are talking about might take me a little while: I am an environmental/analytical chemist first, my coding skills are basic ones and I need time to fix the issues you pointed out. But I'll definitely do them before we are submitting any other MS2 spectrum.

Thank you for your help.

meier-rene commented 2 years ago

Please do not resubmit. Just check the 3 spectra and tell me what to do. All other fixes are already in place (you can find this in the BenildeB-BenildeB_dev branch ) and after resolving the last 3 issues I will merge your contribution.

BenildeB commented 2 years ago

@meier-rene I just realized I could fix it online. Thanks for the reminder.

BenildeB commented 2 years ago

@meier-rene I did the modifications in my branch, I hope it worked.

meier-rene commented 2 years ago

Thank you for your contribution. Its now in the dev branch and will be released with the next release. If you have more contributions, I'm happy to help polishing your pipeline.

BenildeB commented 2 years ago

Thanks a lot @meier-rene.

BenildeB commented 2 years ago

@meier-rene I am currently working on our script, and using some of the txt files we submitted I noticed that our APCI pos (AS000072 to AS000143) and ESI pos (AS000144 to AS000210) files are missing the splash number. It's been replaced by NA. I don't know what happened as we used the same script for both positive and negative. But if it's ok with you I would like to fix this. How should I do that? (I have the fixed text files available now) Sorry for this.

meier-rene commented 2 years ago

No need to do anything on the already submitted files. They got fixed by me. I have pipelines doing this and our validations step would identify incorrect or missing SPLASHs. There were more spectra with incorrect/wrong SPLASHs, not just the two you mentioned. This is the commit with all the splash fixes: https://github.com/MassBank/MassBank-data/commit/4e945ce4eddc1aebfe5cb4d6d39161172e24af7a

BenildeB commented 2 years ago

@meier-rene Sorry for that, and thanks for fixing it. (I know it was not only two files, but all the APCI pos and all the ESI pos.)

BenildeB commented 2 years ago

@meier-rene Thanks. However, I noticed none of our splash numbers were correct, and I am not sure I understand why. Could it be because none of our relative intensities were rounded?