Open ronentk opened 2 months ago
We can present the data in a meaningful way, but not to evaluate it as a multi-label problem, as the True Labels are by def binary. What are the conditions for getting "research_auto"? I already have the types logged in the outcome dataset, I can simply use it to run an evaluation that includes aggregation of that data
item_types_whitelist = [
"bookSection",
"journalArticle",
"preprint",
"book",
"manuscript",
"thesis",
"presentation",
"conferencePaper",
"report",
]
# if any item types on the whitelist, pass automatically
if len(set(result.item_types).intersection(set(item_types_whitelist))) > 0:
return SciFilterClassfication.RESEARCH
@ShaRefOh this condition holds for your annotations as well, right?
if len(set(result.item_types).intersection(set(item_types_whitelist))) > 0:
return SciFilterClassfication.RESEARCH
For example, something like this
From the current form:
The rationale is 1- it can help with the filter evaluation - differentiating between easy (auto) and hard cases (pred) 2 - we might want to use the information in the app to further organize the queue/UX
What do you think @ShaRefOh ?