Closed FabianHoerst closed 5 months ago
@FabianHoerst what labels you use depends on your goals
The links to the data used here closely follow the terminology from the paper. P-truth is the singular truth inferred by pathologist reader annotations depicted in Figure 2B. Inferred NP-label is the same but for the non-pathologist annotations. Raw multi-rater data is also available for download there. The sections Evaluation dataset, Bootstrap control, and Unbiased control are all clearly defined in Figure 1B and methods.
Multirater data involves some reconciliation of interobserver data and is what most people would choose for evaluation. For training people could use the single rater data or some portion of multirater data, depending on their objectives. This paper describes one possible set of choices.
What seems like bloat to you may be required by other people who are interested in the multirater data. It's a complex study and transparency requires releasing all data. As I said, great care was taken to describe things consistently within the dataset and papers.
Hello @cooperlab,
thanks for the explanation. Maybe the term bloated was not the appropriate one, overwhelming would rather be a better on..
Hello @cooperlab,
thanks for the explanation. Maybe the term bloated was not the appropriate one, overwhelming would rather be a better on..
Seems you have what you need. Reach out if you need further clarification.
Hello,
First of all, thank you very much for making this repository with the data and code available! I do have a question regarding the dataset: Which dataset should be used for training and which one for validation? How exactly was the evaluation set created, and which labels are recommended (EvaluationSet/NPsAreTruth_E/PsAreTruthE)? Unfortunately, the structure of the dataset is a bit bloated, making it difficult for me to understand which parts contain which data and how they should be used.
Thanks in advance!