MassBank / MassBank-data

Official repository of open data MassBank records
68 stars 55 forks source link

added 818 records by Antwerp University METOX #235

Closed meowcat closed 10 months ago

meier-rene commented 11 months ago

Thanks you for the contribution. Could you please review the following issue: Antwerp_Univ/MSBNK-Antwerp_Univ-METOX_N103026_9C9C.txt: PK$NUM_PEAK: 2455 Antwerp_Univ/MSBNK-Antwerp_Univ-METOX_N103027_B8BB.txt: PK$NUM_PEAK: 3660 Antwerp_Univ/MSBNK-Antwerp_Univ-METOX_N103028_9CB7.txt: PK$NUM_PEAK: 2678 These spectra look noisy with an unusual high number of peaks. In principle we prefer centroid mode data and 99% of the spectra look good. But there is a number of spectra with high peak number. Could you please verify and decide if they look like intended? Addition: there are some more spectra with more than 100 peaks...

meowcat commented 11 months ago

Hi, I'll notify the spectra author and get back to you. My gut feeling would be to delete those spectra.

meowcat commented 11 months ago

For the moment you can delete those and add only that ones that are ok. I wanna manually check the failed ones (can I have the list with the bad ones?) and ask my colleagues to reinject if the data quality is not good since the goal is to keep updating the library.

Cheers, Manuela

meowcat commented 10 months ago

Uh sorry the Eawag records should not have gone into this PR.

meowcat commented 10 months ago

@meier-rene What would you suggest? We can remove just the 3 superrecords. Otherwise there are 5 more records with 209-423 peaks and 21 more records with 100-181 peaks. We can also remove those, though it becomes unclear what the limit should be, there isn't really a clear cutoff after the 3 crazy ones.

meier-rene commented 10 months ago

Hi @meowcat, you are right. We don't have a limit and I think we will keep this policy. In principle I would not care about the number of peaks, but the low number of spectra with many peaks eat most of the time in spectra similarity search and thus creates a performance issue. For now I took out seven spectra after i visually inspected the spectra with more than 100 peaks. These 7 spectra look different compared to the reference spectra we usually have in massbank. I think there are issues with signal to noise ratio. Here is the list: MSBNK-Antwerp_Univ-METOX_N103027_B8BB.txt MSBNK-Antwerp_Univ-METOX_N103028_9CB7.txt MSBNK-Antwerp_Univ-METOX_N103026_9C9C.txt MSBNK-Antwerp_Univ-METOX_N106726_9C9C.txt MSBNK-Antwerp_Univ-METOX_N103341_C0B4.txt MSBNK-Antwerp_Univ-METOX_N109926_9C9C.txt MSBNK-Antwerp_Univ-METOX_N103343_571D.txt Please ask the authors to inspect these spectra and hopefully they can be cleaned up a bit. If the original authors think these spectra are ok we will keep them as they are. All the other spectra are merged to dev and will go into our next release.

After I took out the 7 spectra I made a squash merge. Thats why I will close this PR.

Thanks again to the contributors! Good work!