According to the FLORES-101 paper, "we manually labeled all sentences by a more detailed sub-topic, one of 10 possibilities: crime, disasters, entertainment, geography, health, nature, politics, science, sports, and travel". Table 1 in the paper includes the statistics of these different sub-topics. However, in the metadata files there is a much larger number of sub-topics (actually, 306) such as:
Accident
accidents
accordion/right hand
advanced interactive media
Alchohol
American education/forgotten half/Foster care
American education/Special Needs ADD
...
ancient china/government
Ancient Civilizations/Romans
Ancient_Civilizations/Assyrians
...
big cats
big cats, lion
big cats, ocelot
big cats, tiger
Blended Learning/Blogging
Blended Learning/Field trips
Bugs/Insects_Intro
business
castles of england/tudor castles
castles of english/development of castles
climate
...
Is the 10-class metadata available for download or some recomendations on how to group the existing ones into a smaller number of topics?
The list of the 306 topics may be asily obtained with:
According to the FLORES-101 paper, "we manually labeled all sentences by a more detailed sub-topic, one of 10 possibilities: crime, disasters, entertainment, geography, health, nature, politics, science, sports, and travel". Table 1 in the paper includes the statistics of these different sub-topics. However, in the metadata files there is a much larger number of sub-topics (actually, 306) such as:
Is the 10-class metadata available for download or some recomendations on how to group the existing ones into a smaller number of topics?
The list of the 306 topics may be asily obtained with: