Closed tsalo closed 4 years ago
Here are the general steps to the pipeline we used:
We are dropping several of the feature spaces and switching from hold-out based evaluation to cross-validation with StratifiedKFold (to preserve class proportions) and nesting (to perform hyperparameter tuning/potentially feature selection.
@mdtdev @mriedel56 This seems like a good place to give a brief overview of the pipeline Jason and I used for our version of ATHENA. Based on what you two think, we can figure out the ultimate structure for the project. I want to differentiate between the paper and the tool, though. I think the paper should include more features, which will be reduced via feature selection. Additionally, the paper will include more visualization and evaluation methods than the final tool.
The feature spaces Jason and I tested out:
Title wordsKeywordsAuthors/YearJournal