JamesHWade / measure

The goal of measure is to be a recipes-like interface to tidymodels for analytical characterization data.
https://jameshwade.github.io/measure/
Other
5 stars 2 forks source link

possible public data sets? #6

Open topepo opened 2 years ago

topepo commented 2 years ago

We should probably try to get some real raw data. https://data.mendeley.com is fairly good (if you have low expectations).

Searching for "hplc spectra" within datasets yield this, which seems helpful.

Anyway, we can add sources to this issue thread.

JamesHWade commented 2 years ago

I added a list of data sources here and copied below. I have some real data that isn't approved for external release, and I may simulate some data to match that structure.

There are a number of data sets in modeldata and prospectr packages. These include modeldata::meats and prospectr::NIRsoil.

Mendeley Data offers data inlcuded as part of various publications. A few possibilities include:

Machine Learning of MS dataset Data for: A Sensitive Quantitative Analysis of Abiotically Synthesized Short Homopeptides using Ultraperformance Liquid Chromatography and Time-of-Flight Mass Spectrometry

There are also some repositories that might be of use:

Crystallography Open Database NMRShiftDB Spectral Database for Organic Compounds, SDBS NIST Chemistry WebBook

topepo commented 2 years ago

Here's a fairly large data set described here: https://chemom2019.sciencesconf.org/resource/page/id/13.html with the source data linked at the bottom. We don't know the actual wavelengths or the test set outcome data but it looks like a nice data set.