EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.
https://epistasislab.github.io/pmlb/
MIT License
799 stars 133 forks source link

New dataset on flow in rough pipes #170

Closed gkronber closed 1 year ago

gkronber commented 1 year ago

The Nikuradse dataset is a real world dataset of high quality and has interesting nonlinear dependencies.

The original source is "J. Nikuradse, Laws of Flow in Rough Pipes, Technical Memorandum 1292, National Advisory Committee for Aeronautics, 1950, (Translation of "Strömungsgesetze in rauhren Rohren", VDI-Forschungsheft 361, Beilage zu "Forschung auf dem Gebiete des Ingenierswesens" Ausgabe B Band 4, July/August 1933).

The relevant figures in the paper are Figure 9: image Unfortunately, the datapoints for Re < 10^(3.8) that are shown in the plot are not given in the tables in the paper.

and Figure 11: image

Leading to two datsets that can be used for benchmarking. The first problem has two input variables (Re, r/k) the other for the Prandl collapse only one input variable.

The dataset has recently been used in combination with symbolic regression in:

Bayesian Machine Scientist to Compare Data Collapses for the Nikuradse Dataset Ignasi Reichardt, Jordi Pallarès, Marta Sales-Pardo, and Roger Guimerà Phys. Rev. Lett. 124, 084503 – Published 27 February 2020 https://doi.org/10.1103/PhysRevLett.124.084503

I have not found the dataset in electronic form anywhere and have therefore extracted the data from the original paper via OCR.

gkronber commented 1 year ago

Merged with #175.

gkronber commented 1 year ago

@trangdata thanks for helping out with this PR. I can think of a few more interesting datasets that I have in my collection for testing symbolic regression which are not available in PMLB. I would be happy to prepare them in the same way to be merged.

I would check for duplicates and would only add datasets that I believe are valid and interesting for SR. Are there any requirements that I should think about, beyond what is mentioned on https://epistasislab.github.io/pmlb/contributing.html?

trangdata commented 1 year ago

@gkronber That would be awesome! 💯 I don't think there are any other specific requirements, but we should update CONTRIBUTING to mention "make sure to have GitHub Action enabled in your repo" 😅 to avoid future contributors missing that part.