Closed ohsOllila closed 1 year ago
Equal amounts of data are needed because the fitness function doesn't make sense unless every simulated value has an experimental counterpart.
However, I have now provided another version of the script where this is not explicitly tested (https://github.com/hsantila/MATCH/blob/master/scripts/NMRL3_analysis/analysis_NMRL3v2.py). The code skips OPs that are not present in the file containing OP calculation output. Note that you have to manually remove the corresponding value from the experimental results since they only have an ambiguous number indicator in the format I was given.
Same applies is an experimental OP is missing: remove it from the list of simulated ones.
It is a bit messy but with the experimental format what it is, this was the only workaround I could think of.
Quality evaluation is now done the new databank, see current status from http://nmrlipids.blogspot.com/2022/09/nmrlipids-databank-form-factor-quality.html. Therefore, this issue is outdated and I will close this.
Currently the fitness code https://github.com/NMRLipids/MATCH/blob/master/scripts/NMRL3_analysis/analysis_NMRL3.py requires an equal amount of order parameters in simulation and experimental data.
However, in some cases, all the order parameters cannot be calculated due to overlapping atom names (see issue https://github.com/NMRLipids/MATCH/issues/9). In addition, we may have cases in the future where only part of order parameters are known experimentally.
It would be useful to be able to run the fitness code despite of the incomplete order parameter dataset.