Open christofs opened 5 years ago
I'm working on it. We are currently preparing a prototype brand-new shiny-GUI for the stylo-function, and need some metadata handling for that anyway.
Thanks so much, it is a great feature. One more follow-up wish: It would be wonderful if the filename of the resulting dendrogram could have the value of "grouping.column" as a suffix to the other parameters. In that way, one could use several different grouping criteria and the dendrogram files would not get overwritten.
This is a feature request that has been on my mind for a while. It would be really neat if the stylo() function could pull information used to color the dendrogram from a metadata table instead of from the filename.
Maybe the default GUI option could remain as it is, but a CLI option could be used to (a) indicate that the coloring should be derived from a CSV file with metadata, (b) the path to the metadata file, and (c) the column to be used for coloring. The metadata table would need to have a column called "filenames" (or something along these lines) that contain the filenames actually used, so that the metadata can be mapped to the actual filenames.
You could dispense with (b) if a conventional filename, e.g. "metadata.csv", and file location, e.g. the current working directory, is instead foreseen.
The advantage of this would be that it becomes much easier to switch, for a given stylometric analysis, between colorings of different types (e.g. author vs. genre vs. decade of publication) to compare the clusterings to various potential factors influencing text similarity, without keeping multiple identical corpora (only with different filenames) and without calculating the text similarity multiple times.