CDU-data-science-team / pxtextminingdashboard

Text mining and visualization of NHS patient feedback.
https://feedbackmatters.uk/rsconnect/text_mining_dashboard/
Other
4 stars 1 forks source link

Bigram bars sometimes overlap #25

Closed andreassot10 closed 3 years ago

andreassot10 commented 3 years ago

Depending on label and organization chosen, this can sometimes happen:

image

I've set height = "1000px" in here. This does sometimes fix the issue, but not for this example.

@ChrisBeeley, is there a way to dynamically increase the plot size? And if yes, is there a way to know beforehand if the bars will likely overlap?

ChrisBeeley commented 3 years ago

You'll need to render the plotOutput on the server side using renderUI, and use the width argument of plotOutput.

I guess it's some function of how many bars there are, so you'll need to work out how many bars there are and map that to the pixel height of the plot. Probably easiest like this:

if(number_of_bars > 25){ plot_width <- "1000px" } else { plot_width <- "600px" }

plotOutput(blah... width = plot_width)

ChrisBeeley commented 3 years ago

Although thinking about it more, I notice that the problem in the graph is that there are a lot of ties for most numerous bigram. Maybe it would be better to just present some of them, or even remove them all- I'm guessing each bar represents just one bigram, right?

andreassot10 commented 3 years ago

Thanks for the tips. Indeed, the ties are the problem and yes, each bar represents one bigram. Selecting the, say, top-10 or top-15, will select much more than 10 or 15, because of the ties. The reason why I'm not forcing it to slice the top-10/15 rows (instead of selecting all the rows with the top-10/15 TF-IDFs) is because I don't want to have to arbitrarily choose what's displayed and what's left out. I'll give it a go with your proposed solution.

ChrisBeeley commented 3 years ago

I would argue that bigrams with only one occurrence in the data are essentially meaningless. Personally I would remove them all

andreassot10 commented 3 years ago

It depends on the class and trust. Look how nice, clean and informative this one is:

image

That's why I want to keep the bigrams option in the dashboard.

ChrisBeeley commented 3 years ago

I still think one is not enough. I would combine them all the trusts in this version, and save bigrams for trust for the production version, which will have way more text in it.

andreassot10 commented 3 years ago

I don't disagree with what you're proposing (temporarily combine all trusts for the TF-IDF bar plots), and it definitely fixes the problem for now. But I'm a little worried it would confuse users? I.e. why would we display bigram bars from all trusts to a user from, say, Trust C?

ChrisBeeley commented 3 years ago

IMO this is a dev version. It doesn't have complete data in it and it's not being used in production. Better to give realistic examples using contrived data than contrived examples using real data. Once the pipeline is properly deployed they'll be able to use it in production and will have useful graphics with their own data