Tencent / NeuralNLP-NeuralClassifier

An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Other
1.85k stars 406 forks source link

Clarification around rcv1.taxonomy file #41

Closed tsu3010 closed 4 years ago

tsu3010 commented 4 years ago

Hi, Thanks for this really nice repository on text classification. Had a question regarding how the taxonomy file should be structured for any hierarchical classification problem (couldn't find anything on the readme regarding that).

For an example problem where the possible hierarchies looks as follows

Screenshot 2020-01-09 at 11 14 29 AM

All-Science All-Science-NaturalScience All-Science-NaturalScience-Biology All-Humanities All-Humanities-History All-Humanities-History-WorldHistory All-Vocabulary All-Vocabulary-ReadingVocabulary

How should the taxonomy file be set up?

coderbyr commented 4 years ago

Hi, Thanks for this really nice repository on text classification. Had a question regarding how the taxonomy file should be structured for any hierarchical classification problem (couldn't find anything on the readme regarding that).

For an example problem where the possible hierarchies looks as follows

Screenshot 2020-01-09 at 11 14 29 AM

All-Science All-Science-NaturalScience All-Science-NaturalScience-Biology All-Humanities All-Humanities-History All-Humanities-History-WorldHistory All-Vocabulary All-Vocabulary-ReadingVocabulary

How should the taxonomy file be set up?

The taxonomy is used to calculate hierarchical loss, which needs the parent-children relations. so, for the above example, the taxonomy should be like "ALL(suppose root node) \t Science \t Humanities \t Vocabulary " "Science \t NaturalScience" "NaturalScience \t Biology" ...

odek53r commented 4 years ago

Hi, Thanks for this really nice repository on text classification. Had a question regarding how the taxonomy file should be structured for any hierarchical classification problem (couldn't find anything on the readme regarding that). For an example problem where the possible hierarchies looks as follows

Screenshot 2020-01-09 at 11 14 29 AM

All-Science All-Science-NaturalScience All-Science-NaturalScience-Biology All-Humanities All-Humanities-History All-Humanities-History-WorldHistory All-Vocabulary All-Vocabulary-ReadingVocabulary How should the taxonomy file be set up?

The taxonomy is used to calculate hierarchical loss, which needs the parent-children relations. so, for the above example, the taxonomy should be like "ALL(suppose root node) \t Science \t Humanities \t Vocabulary " "Science \t NaturalScience" "NaturalScience \t Biology" ...

Does the relations order affect the result?

coderbyr commented 4 years ago

Hi, Thanks for this really nice repository on text classification. Had a question regarding how the taxonomy file should be structured for any hierarchical classification problem (couldn't find anything on the readme regarding that). For an example problem where the possible hierarchies looks as follows

Screenshot 2020-01-09 at 11 14 29 AM

All-Science All-Science-NaturalScience All-Science-NaturalScience-Biology All-Humanities All-Humanities-History All-Humanities-History-WorldHistory All-Vocabulary All-Vocabulary-ReadingVocabulary How should the taxonomy file be set up?

The taxonomy is used to calculate hierarchical loss, which needs the parent-children relations. so, for the above example, the taxonomy should be like "ALL(suppose root node) \t Science \t Humanities \t Vocabulary " "Science \t NaturalScience" "NaturalScience \t Biology" ...

Does the relations order affect the result? what "order" means here? each line in the taxonomy file should keep the parent-child order, no matter the line order of "Science" and "NaturalScience" .