Term Frequency Demo - Githubissues

As textvis is positioned as a tool for NLP research, a demo focusing on that use case should be implemented. A common task in NLP is term frequency analysis. As a demonstration which NLP folks will find compelling, we should roll an implementation of this based on the heat mapping discussed in #13.

This should be implemented as a new tab (to the left of the examples). It should provide a text entry box for the user to enter a comma-delimited list of terms, and a button to upload a text document. The "visualization" page is a good starting point, as it already implements file upload.

Once the user uploads the file, the script on the page should call a function from a new js file which converts the text document into a JSON object of the format that the main textvis implementation can consume.

The function will need two passes, one to compute the minimum and maximum occurrences of any term in the list per sentence (as well as total number of occurrences), and a second which actually generates the heat values (which need to be normalized so that a heat value of 1.0 means "this sentence has the most occurrences of any term out of the whole document", and 0.0 being the least).

Additionally, paragraph SVG annotations should be used as tooltips containing the number of occurrences [of any term] in the paragraph as a count, and as a percentage out of total occurrences in the entire document.

For convenience, we are treating any occurrence of any term in the list as an increment to the counter. There will be one counter per sentence (plus per paragraph).

This demo will not use the links capability, since it dosesn't really make sense, and because it will be useful to show that not every textvis feature necessarily needs to be used for every possible use case.

charlesdaniels / textvis

Term Frequency Demo #18