dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
504 stars 77 forks source link

order values by frequency #141

Closed lvestito closed 3 years ago

lvestito commented 3 years ago

Hi all,

firstly thank you so much for developing this amazing tool. It's incredibly useful. If possible I wanted to ask you for some tips that would help my data visualization.

The first thing that I would like to do is to order the Stats/Values based on Freq, not alphabetically

or either to over impose my specific order of interest, based on the Stats/Values names. For example, I would like for them to show up like this (1. dog 2.horse 3. cat) independently from their frequency or alphabetical order. I hope you'll be able to help me,

thank you so much for your help!

dcomtois commented 3 years ago

Hi,

As you're not the first to ask for this feature, it will be added to the todo list.

Things to know, in the meantime

lvestito commented 3 years ago

Hi,

Using forcats worked perfectly thank you very much!

May I ask you an additional question?

If I have something like:

Patient A cataract|hearing loss|intellectual disability Patient B hearing loss Patient C intellectual disability

As a default, when using dfSummary I will have for that column:

cataract|hearing loss|intellectual disability 1 hearing loss 1 intellectual disability 1

but I would actually prefer to strsplit that and have something like

intellectual disability 2 hearing loss 2 cataract 1

Would you please advise on what would be the best way to obtain that result?

Thank you very much for your help

dcomtois commented 3 years ago

Glad forcats worked for you. For your other question, there are several ways you can deal with that, but the one that comes to mind is strsplit(). For instance:

library(summarytools)
v <- c("abc", "def", "abc|def", "ghi", "abc|ghi", "abc|def|ghi")
freq(unlist(strsplit(v,"\\|")))

Frequencies  

              Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------ --------- -------------- --------- --------------
        abc      4     40.00          40.00     40.00          40.00
        def      3     30.00          70.00     30.00          70.00
        ghi      3     30.00         100.00     30.00         100.00
       <NA>      0                               0.00         100.00
      Total     10    100.00         100.00    100.00         100.00

Now for making it work in dfSummary, it's more complicated. You could try and look for answers on a forum like stackoverflow. I also offer my services as a consultant, if you want to know more just send me an email. Thx