MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
13 stars 22 forks source link

Search command #150

Open dnikol1 opened 5 years ago

dnikol1 commented 5 years ago

Why is it that we cannot just search molecular formula without putting something in the Compound Name line?

Why is it that the spectrum in the format of m/z vs absolute intensity cannot be searched, but instead requires normalized intensity?

tsufz commented 5 years ago

Hi, I transfered this issue, because there was a wrong link on MassBank.

tsufz commented 5 years ago

Why is it that the spectrum in the format of m/z vs absolute intensity cannot be searched, but instead requires normalized intensity?

I totally agree with the comment. It is not very practical to normalise a peak list before submitting. The online tool should be as comfortable as possible. Usually, the people copy and paste a spectra from their MS viewer or so. We could do the normalisation in the background while handling the query?

tsufz commented 5 years ago

Why is it that we cannot just search molecular formula without putting something in the Compound Name line?

Thanks for the comment, we will improve the search.

meier-rene commented 5 years ago

Why is it that we cannot just search molecular formula without putting something in the Compound Name line?

Fixed with 6217299b5eff61d7e30b8ccafe95e25ebf6e2765

tsufz commented 5 years ago

Fixed with 6217299

The search does not handle truncations so far. It is possible to search for C8H10N4O2 and get the results for caffeine. If I truncate like C8H10N*O2, I get empty results.

meier-rene commented 5 years ago

8f87c929928b2297138cf4d1fa12727ee8a6d3e0 implements wildcards on chemical formulas. This wildcards work only on string basis without any chemical knowledge. This means C8OP will result in 291 results while C8PO will have no results. Implementing chemical knowledge here would be a bigger effort.

schymane commented 5 years ago

Not sure what you mean by “chemical knowledge” but there are a few R functions already out there that just split the formulas into the corresponding elements and then accompanying numbers and this would be one way you could expand the wildcard functionality to be order-independent … OR we have all formulas standardized according to the chemical rules (e.g. Hill system) in MassBank. I think this is not yet the case. I’d prefer the former for the wildcard search as it’s more universal.

tsufz commented 5 years ago

Would be some easy code: https://stackoverflow.com/questions/2974362/parsing-a-chemical-formula or could be use CDK? Needs to be Java and not R.

tsufz commented 5 years ago

https://gist.github.com/atomictom/7797647