dzieciou / tree-labeller

Helps label training data using taxonomy information.
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Sampling size #27

Closed dzieciou closed 1 year ago

dzieciou commented 1 year ago

Currently, the sample size is always 10 but in the case of available labels of more than 10, this makes no sense.

This is more about algorithm and less about UI.

Perhaps sample size should be calculated automatically depending on the labels count, etc.

pkubiak commented 1 year ago

Also due to population selection method, on small datasets, package generate single-item iterations

dzieciou commented 1 year ago

Right, single-item iteration will happen in two situations:

Ideally, the number of returned items in interaction should converge to 0. If it does not (i.e. stays above 0 for many iterations), then there is no way to disambiguate specific categories and create a complete mapping. The ultimate mapping and annotation will be suboptimal.