Closed BenildeB closed 2 years ago
Hi @BenildeB,
Thank you so much for the submission of your records! The submitted branch is now inline with the MassBank-data:dev
branch and thus could be merged theoretically. However, the validator module did not run, and thus we cannot check the compliance of your records.
@meier-rene, now is your turn to look for the reason why the travis-workflow did not perform?
Best Tobias
Thanks for your work @BenildeB and @tsufz. Our validation has stopped, because we are out of credits at Travis. Unfortunately they make it so complicated to get free credits for OSS projects, that I will move our validation to Github Actions. For now I can validate things manually at my local computer.
Hi @BenildeB , I had a look at your data. There was a number of issues, which I could resolve. But there are three problems where your inspection is required. This is a wrap up of the validator messages:
Incorrect number of peaks in peaklist. 4 peaks are declared in PK$NUM_PEAK line, but 5 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000254.txt'.
Incorrect number of peaks in peaklist. 30 peaks are declared in PK$NUM_PEAK line, but 7 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000182.txt'.
Incorrect number of peaks in peaklist. 13 peaks are declared in PK$NUM_PEAK line, but 7 peaks are found. massbank.cli.Validator - Error in 'ACES_SU/MSBNK-ACES_SU-AS000151.txt
Its impossible for me to resolve this kind of inconsistencies, because only you have the primary data. Please check these three peaklists and give me a feedback.
@BenildeB my second comment is about the issues I could fix. And I also have a questions. I have noticed, that you did your data processing with MS-DIAL and I noticed, that the issues I found are consistent for the whole submission. I expect that you have some sort of pipeline to process that MS-DIAL data to produce the files you contributed. Are you willing to provide that pipeline? It might be interesting for other people as well. I want to help polishing that pipeline to fix the remaining issues. Please think about it. The issues I Identified are:
All these issues should be easy to fix in the pipeline. What do you think?
@meier-rene
Thank you for your feedback.
For the pipeline, I did write something in R for internal purpose to update the MS-DIAL txt files we are getting to fit with MassBank guidelines, involving some in-house data management (with a separate excel file). I have to discuss it with my group and supervisor before to take any decision about it. It would probably need a lot more work before being able to share it.
For the 3 files that are not correct, that can be because we have a manual step in the beginning (making sure we do not have too much noise), and I probably did not modify these files correctly. I'll fix them and resubmit them.
I thought I fixed the two spaces things, I'll take a new look to it.
I'll take a look to the non-breakable spaces, but I did know it was possible to have them in R
I can fix the empty line problem at the end of the line, I actually thought it was needed.
Considering the intensities rounded to integer, it might take me little bit longer as we did not modify this part from the txt files we get from MS-DIAL
Considering the SMILES, we did use those coming as an output when we are calculating the RTI (we noticed some were changed, and we thought they might have been more "correct" that those we used as input)
Considering all of this, should I resubmit only the files that had problems (AS000151/182/254), or do the modifications on all of them? We would like to submit these MS2 spectrum before submitting a paper for publication, and the modifications you are talking about might take me a little while: I am an environmental/analytical chemist first, my coding skills are basic ones and I need time to fix the issues you pointed out. But I'll definitely do them before we are submitting any other MS2 spectrum.
Thank you for your help.
Please do not resubmit. Just check the 3 spectra and tell me what to do. All other fixes are already in place (you can find this in the BenildeB-BenildeB_dev branch ) and after resolving the last 3 issues I will merge your contribution.
@meier-rene I just realized I could fix it online. Thanks for the reminder.
@meier-rene I did the modifications in my branch, I hope it worked.
Thank you for your contribution. Its now in the dev branch and will be released with the next release. If you have more contributions, I'm happy to help polishing your pipeline.
Thanks a lot @meier-rene.
@meier-rene I am currently working on our script, and using some of the txt files we submitted I noticed that our APCI pos (AS000072 to AS000143) and ESI pos (AS000144 to AS000210) files are missing the splash number. It's been replaced by NA. I don't know what happened as we used the same script for both positive and negative. But if it's ok with you I would like to fix this. How should I do that? (I have the fixed text files available now) Sorry for this.
No need to do anything on the already submitted files. They got fixed by me. I have pipelines doing this and our validations step would identify incorrect or missing SPLASHs. There were more spectra with incorrect/wrong SPLASHs, not just the two you mentioned. This is the commit with all the splash fixes: https://github.com/MassBank/MassBank-data/commit/4e945ce4eddc1aebfe5cb4d6d39161172e24af7a
@meier-rene Sorry for that, and thanks for fixing it. (I know it was not only two files, but all the APCI pos and all the ESI pos.)
@meier-rene Thanks. However, I noticed none of our splash numbers were correct, and I am not sure I understand why. Could it be because none of our relative intensities were rounded?
Hello,
Here is a new try. If this one is not working, I don't know what I did wrong. I hope this one will work. Thanks.
Best, Bénilde