Benchmarking-Initiative / Benchmark-Models-PEtab

A collection of mathematical models with experimental data in the PEtab format as benchmark problems in order to evaluate new and existing methodologies for data-based modelling
BSD 3-Clause "New" or "Revised" License
26 stars 14 forks source link

Errors in Chen_MSB2009 benchmark #175

Open FFroehlich opened 1 year ago

FFroehlich commented 1 year ago

Looking at the Chen_MSB2009 benchmark model, I suspect I may have identified some errors in the measurements table (https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab/blob/master/Benchmark-Models/Chen_MSB2009/measurementData_Chen_MSB2009.tsv).

The original data is available from the supplement of https://doi.org/10.1038/msb.2008.74 (MSB data), which was reused in https://doi.org/10.1371/journal.pcbi.1005331 (PLoSCB data). The issue with the MSB data is that standard deviations for measurements often contain 0 (see in supplement to https://doi.org/10.1038/msb.2008.74 _dataset/Chen et al - Experimental Data/A431_experiment.out), which makes the data not suitable for fitting. This is the likely reason why I added 0.1 to the standard deviations in the PLoSCB data (it's been a while ...; see supplement to https://doi.org/10.1371/journal.pcbi.1005331 code/project/data/getData.m lines 756-758.).

However, I ran into the following discrepancies:

ERK_PP data for model1_data3 condition in benchmark doesn't match MSB data (Low (1e-11 M) EGF condition) or PLoSCB data (D(3), lines 687-698) (looks like a copy & paster error in the benchmark data, as model for model1_data2 and model1_data3 are the same). MSB and PLoSCB data match.

AKT_PP data for model1_data4 condition in benchmark does match MSB data (Low (1e-10 M) HRG condition) but not PLoSCB data (D(4), lines 704-715) (looks like a copy & paste error in PLoSCB data, as data for model1_data3 and model1_data4 are the same. This sucks, but shouldn't affect any of the conclusions in the paper).

This of course begs the question about the origin of the benchmark data. As the data in the benchmark example also contains 0.1 values (as in the PLoSCB data) for the standard deviation instead of 0.0 values (as in the MSB data), this makes me believe the measurements file in the benchmark was likely derived from PLoSCB data (likely fixing the issue with model1_data4, but introducing the issue with model1_data3 😢).

I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...

FFroehlich commented 1 year ago

Ah it looks like the benchmark was exported from the Hass (MATLAB) suite where the same mismatch is present: https://github.com/Benchmarking-Initiative/Benchmark-Models/blob/master/Benchmark-Models/Chen_MSB2009/Data/model1_data4.xlsx

FFroehlich commented 1 year ago

Overall provenance of this benchmark model is a bit tricky, since both the PLoSCB implementation and the d2d implementation use standard deviations to normalize data, while in the original MSB paper measurements were normalized by the maximum for each observable across time+conditions:

image
FFroehlich commented 1 year ago

ping @elbaraim @dilpath

dilpath commented 1 year ago

Thanks for raising this issue, and the thorough feedback! I am currently the only maintainer of this repo now -- unfortunately, I haven't worked with this model yet.

What I got from this is:

  1. a note should be added, to say that the data used in the PLoS CB paper is different to what we provide
  2. condition model1_data3, observable ERK_PP needs to be changed to match MSB data
  3. although PLoS CB gets fitting working by specifying a standard deviation of 0.1 to some data, we need to reassess how to treat the data with 0 noise
    • since the objective function in your screenshot looks like least squares, I propose normal noise with standard deviation 1
  4. data normalization needs to be handled
    • I propose estimating scaling factor(s)

I will refrain from making any remarks regarding how much I loathe data that is not available in easily machine readable formats and data processing pipelines that involve manual steps ...

:sob: Thanks for the work done already for the currently implementation!