Open alexhebing opened 4 years ago
This is certainly possible and shouldn't be hard to achieve. However, how to package this kind of behaviour on the frontend is a harder question. Potentially, we could make an extra corpus setting which means the csv download functionality will be replaced by a download of txts as zip. Should we discuss this tomorrow?
When creating the test corpus (i.e. The DInner and Harry Potter), Haidee explicitly asked for a txt version of the corpus, i.e. a file for each review that contains only the review text, and some metadata in the filename. I assume this makes it easier to work with (subsets of) the data in applications like Voyant (etc).
I can, and probably will, share the full corpus with Haidee and Gys-Walt, including txts, once the scraping is done. However, given the number of titles, I expect to scrape over 100.000 reviews. This makes selecting the txts for a subset virtually impossible.
Question: is it conceivable / do-able to add a full text download to I-analyzer, that would allow downloading a subset of reviews / documents? There is also a script I developed for @JosedeKruif that can do this type of thing (here), but the disadvantage of this is that customers would have to run python locally (and setup virtualenv etc). @BeritJanssen : what do you think, is a txt download from I-analyzer feasible?