Observed data format: 026_T1C1E1_T2C2E2

Yuri05 commented 5 years ago

Time_Brain [min]	Concentration_Brain [mg/ml]	Error_Brain [mg/ml]	Time_Liver [min]	Concentration_Liver [mg/ml]	Error_Liver [mg/ml]
1	0,1		15	0,2	0,1
2	12	3	30	8	2
3	2	1,8	60	2	1
20	0,01		120	0,5	0,3
			240	0,05	0,001

Yuri05 commented 5 years ago

Here and in example #643: if multiple time/measurement/error columns are mapped, the question is how to combine them into observed data sets to be imported. Current implementation of observed data import create a full combinatorics of mapped columns, which is obviously nonsense. (e.g. in example above: if all columns are mapped in the mapping configuration, 8 data sets would be created. But obviously there are only 2)

I think we can solve import of multiple measurement columns from the same data table (use cases attached) if we just make some considerations about the table structure (T=”Time”, M=”Measurement”, E=”Error”)

If max. 1 column of each kind (T, M, E) was mapped: just combine them to the (only) observed data set. Thus here the order of T, M, E in the original data table does not matter
In case that more than 1 column of any kind was mapped: the order of columns in the original data table matters

a. For each mapped measurement column: assign the closest preceding time column to it. E.g.:
- {T, M1, M2} would be mapped to the datasets {T, M1} and {T,M2}
- {T1,M1, T2, M2, M3} would be mapped to the datasets {T1,M1}, {T2, M2} and {T2, M3}
- {M0, T1,M1, T2, M2, M3} would be mapped to the datasets {T1,M1}, {T2, M2} and {T2, M3} Here, M0 has no preceding time column and will be ignored
b. For each mapped error column: assign it to the closest preceding measurement column. E.g.:
- {T, M1, E1, M2} would be mapped to the datasets {T, M1, E1} and{T,M2}
- {T, M1, E1, M2, E2} would be mapped to the datasets {T, M1, E1} and {T,M2, E2}
- {T, M1, M2, E2} would be mapped to the datasets {T, M1} and {T,M2, E2}
- {T, E1, M1, M2, E2} would be mapped to the datasets {T, M1} and {T,M2, E2} Here, E1 has no preceding measurement column and will be ignored
c. If between an error column and its closest preceding measurement column there is another mapped error column: current error column will be ignored, e.g.:
- {T, M1, E1, E2} would be mapped to the dataset {T, M1, E1} E2 is ignored
Dealing with highlighted problematic cases: several alternatives. Either a. just silently ignore or b. produce warnings but still import or c. force the user to change mapping before import

➡️ This logic is easy to implement and does not require any new constructs like groupings etc. ➡️ The majority of data tables with multiple measurements are “properly” constructed (either {T, M1, E1, …,M_n, E_n} or {T1,M1,E1,…T_n,M_n, E_n}) and thus can be imported immediately. ➡️ In rare cases where the table structure does not follow one of those 2 patterns: additional imports can be started by user E.g. if data table has the structure {M0,T1,M1,T2,M2}

User can start the import process and map all columns except M0. This would create 2 observed data sets { T1,M1} and { T2,M2}
After that user can start the second import process and map only M0 and the time column which corresponds to M0 (either T1, or T2). This will finally import the 3rd data set for M0

ju-rgen commented 4 years ago

In my opinion a schema like T,M,E,Variable with Variable = Concentration_Brain | Concentration_Liver would be much easier and is more common in such cases. Even better (and more common) is something like: T,M,M_Unit,E,Variable,Location,Compound where values could be Location: Liver/Plasma, Liver/Intracellular, VenousBlood/Plasma, Urine Compound: BAY_01234, BAY_56789 Variable: Concentration, Fraction

This would just require to filter rows and creating variables for each existing combination of (Variable,Location,Compound) - Isn't that easy - or do I oversee problems?

Sulav1 commented 4 years ago

In my opinion a schema like T,M,E,Variable with Variable = Concentration_Brain | Concentration_Liver would be much easier and is more common in such cases. Even better (and more common) is something like: T,M,M_Unit,E,Variable,Location,Compound where values could be Location: Liver/Plasma, Liver/Intracellular, VenousBlood/Plasma, Urine Compound: BAY_01234, BAY_56789 Variable: Concentration, Fraction

This would just require to filter rows and creating variables for each existing combination of (Variable,Location,Compound) - Isn't that easy - or do I oversee problems?

I also think that having dependent variables distributed across various columns makes thing difficult. A row label that distinguishes between different type as Ju-rgen described would be more easy to deal with and avoid redundant entries.

msevestre commented 4 years ago

I agree with @ju-rgen and @Sulav1 The suggested format here is horrible and terrible and I'll probably have nightmares tonight

Christoph27 commented 4 years ago

I would still like to have the possibility to read in a format like this! A format like this results e.g. after scanning multiple curves from a published figure. Here, having several measurements in different columns is helpful: Often the time is exactly known and you want to replace the scanned value with the known one in the data file. It is easier and less error prone to do this only once for each time point and not several times in different rows for each measurement.
I would not expect any automatic mapping by the software in this case. The user just manually maps what are the columns for time(s), measurements and errors. From my point of view it would be sufficient to support two cases: 1) one time column and several measurement columns all corresponding to the same time column 2) as many time columns as measurement columns I.e. nothing like {T1, M1, T2, M2, M3} mentioned above. And again, the user defines what is T1, T2, M1, M2, E1, E2, ...

msevestre commented 4 years ago

@Christoph27 Aside from the output of the scanning tool, do you ever use such a format? If not, I would strongly suggest to change the output of the scanning tool. We own it and can certainly make modifications

Christoph27 commented 4 years ago

From my point of view the output of the scanning tool makes sense as it currently is. As mentioned above: In case I need to correct scanned time points it is easier and less error prone to do it only once for each time point and not several times in different rows for each measurement. One could maybe think of producing different sheets for each profile as long as the user does not has to re-define the coordinate system for each profile of a plot.

msevestre commented 4 years ago

In case I need to correct scanned time points it is easier and less error prone

@Christoph27 This could be done in the scan tool itself. Having an overview of what was scanned before exporting it.

Implementing such a format in the importer is a massive undertaking and therefore, if the only reason is easyness of importing data from another tool, this is not a good enough reason for me, especially when we can adapt said other tool to fulfil the requirement

Back to my question: Aside from the output of the scanning tool, do you ever use this format?

Christoph27 commented 4 years ago

Aside from the scan tool, the format is so exceptional from my point of view that it would not justify to offer this format.

Yuri05 commented 4 years ago

As a compromise: we could allow the pattern TM1..Mn (one time column an multiple measurement columns) column, if no error/unit/LLOQ columns are mapped at the same time

If user maps >1 measuremnt columns:

importer would check if any of {Error, Measurement_Unit, LLOQ} is mapped as well and throw an error if so
otherwise data sets {T, M1}...{T, M_n} will be imported

Open-Systems-Pharmacology / OSPSuite.Core

Observed data format: 026_T1C1E1_T2C2E2 #647