EmilHvitfeldt / ISLR-tidymodels-labs

https://emilhvitfeldt.github.io/ISLR-tidymodels-labs/index.html
170 stars 70 forks source link

Massive pull request coming once I get through all the chapters here #9

Closed hasandiwan closed 2 years ago

hasandiwan commented 3 years ago

In the Classification chapter, there are several issues -- I can separate them if you prefer:

library(paletteer)
cor_Smarket %>%
  stretch() %>%
  ggplot(aes(x, y, fill = r)) +
  geom_tile() +
  geom_text(aes(label = as.character(fashion(r)))) +
  scale_fill_paletteer_c("scico::roma", limits = c(-1, 1), direction = -1) +
  theme_minimal()

will not run. The fix seems to be:

library(paletteer)
cor_Smarket %>%
  stretch() %>%
  ggplot(aes(x = x, y = y, fill = r)) +
  geom_tile() +
  geom_text(aes(label = as.character(fashion(r)))) +
  scale_fill_paletteer_c("scico::roma", limits = c(-1, 1), direction = -1) +
  theme_minimal()

Similarly, in the next section, instead of

ggplot(Smarket, aes(Year, Volume)) +
  geom_jitter(height = 0)

The following is more explicit, though R doesn't seem to give a toss:

ggplot(Smarket, aes(x=Year, y=Volume)) +
  geom_jitter(height = 0)

You may want to consider disabling the copy button on the results.

Might also want to note that "K-nearest neighbors" requires package 'kknn' to be installed.

I've tried to follow the style of the document as closely as possible.

EmilHvitfeldt commented 3 years ago

I'm not able to reproduce your error, can you please provide a reprex :)

hasandiwan commented 3 years ago

You may need to make your r-session pristine -- R --vanilla should sort you here

EmilHvitfeldt commented 3 years ago

I'm not seeing an error. If you can run the code below and show the error you are seeing, I can help you.

library(tidymodels)
library(ISLR)
library(corrr)
library(paletteer)

cor_Smarket <- Smarket %>%
  select(-Direction) %>%
  correlate()

cor_Smarket %>%
  stretch() %>%
  ggplot(aes(x, y, fill = r)) +
  geom_tile() +
  geom_text(aes(label = as.character(fashion(r)))) +
  scale_fill_paletteer_c("scico::roma", limits = c(-1, 1), direction = -1) +
  theme_minimal()

Created on 2021-08-11 by the reprex package (v2.0.1)

hasandiwan commented 3 years ago

I was getting the error "ggplot2 doesn't know how to deal with data of class uneval". When I explicitly specified that x=x and y=y, the error went away. So, I thought I might note it. Regardless, it is better to be explicit.

EmilHvitfeldt commented 3 years ago

Thank you!

Could you please copy the code above to the clipboard and run:

reprex::reprex(si = TRUE)

and paste back your results? I suspect you are getting the error because of some version differences

hasandiwan commented 3 years ago

sh: +RTS: command not found Error: pandoc document conversion failed with error 127 In addition: Warning message: In system(command) : error in running command

EmilHvitfeldt commented 3 years ago

Okay, it appears that you are having pandoc issues. I can refer to https://stackoverflow.com/a/54100882/4912080.

Thanks for raising the issue, without a proper reprex it can be hard to diagnose. All package versions are listed in the colophon and I would suggest trying to update to those versions before moving on.

hasandiwan commented 3 years ago

If you'd like I can patch your source code and attach the patches version to this case. I still feel that your parameters could be clearer -- using named parameters where relevant. As you saw above, you've missed. a few of these.

hasandiwan commented 3 years ago

mixOmics has been removed from cran, pointed to in chapter 6, and must be installed from bioconductor - https://stackoverflow.com/questions/62064102/how-to-install-mixomics-packages-in-r-version-4-0-0

hasandiwan commented 3 years ago

In Chapter 8, there appears to be another slip, where you're asking to "Let us take a look at the confusion matrix for the training data set and testing data set.

augment(class_tree_fit, new_data = Carseats_train) %>% conf_mat(truth = High, estimate = .pred_class)"

yet only the Carseats_test result is given.

hasandiwan commented 3 years ago

Also in chapter 8, you have the code to generate the rplot for carseats_train, which yields the warning "Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables)."

hasandiwan commented 3 years ago

The next example also gives the same warning. I'm not sure where to add the model=TRUE line, however roundint=FALSE does get rid of the warning. I would, however, like to understand what the reason for this warning is and I haven't looked very closely yet.

Then again, I'm just putting all the issues here and when I finish, I'll write more on how I think they should be solved.

hasandiwan commented 3 years ago

when you say vip, you should specify the appropriate fully-specified function vip::vip

EmilHvitfeldt commented 3 years ago

hello, I see you are working through which I'm happy to see. If you are planning to submit PRs, please split them up into smaller PRs so there are easier to deal with on my end.

when you say vip, you should specify the appropriate fully-specified function vip::vip

I disagree here. I made sure to load all the packages at the beginning of that chapter and describe what each package is used for.

hasandiwan commented 3 years ago

Will do so, was putting the daughter to bed earlier, hence the delay

hasandiwan commented 3 years ago

Do you prefer pull requests or attached patches?

hasandiwan commented 3 years ago

Under ROC curves in Chapter 9, the phrasing "ROC curves can easily be created..." can be confused with "ROC curves can easily be created (by plotting)", when one should be reading it as "The parameters of ROC curves can easily be shown..."

EmilHvitfeldt commented 3 years ago

Do you prefer pull requests or attached patches?

pull requests, please. And you only have to modify the .rmd files, I'll render the rest of the book. It will keep the PRs cleaner

hasandiwan commented 3 years ago

Got it, mate.

hasandiwan commented 3 years ago

In Chapter 12, the line ggplot(aes(V1, V2, color = .cluster)) generates the same error as above ('Error in UseMethod("depth") : no applicable method for 'depth' applied to an object of class "NULL"') until I set x and y explicitly, e.g. ggplot(aes(x=V1, y=V2, color = .cluster)).

hasandiwan commented 3 years ago

I needed to namespace here:

res_hclust_complete <- x_df %>%
  dist() %>%
  hclust(method = "complete")

res_hclust_average <- x_df %>%
  dist() %>%
  hclust(method = "average")

res_hclust_single <- x_df %>%
  dist() %>%
  hclust(method = "single")

as well... namely stats::dist, there's also a proxy::dist.

hasandiwan commented 3 years ago
x %>%
  proxy::dist(method = "correlation") %>
  hclust(method = "complete") %>%
  fviz_dend()

Gives me Error in UseMethod("depth") : no applicable method for 'depth' applied to an object of class "NULL"; I know I solved this above by specifying x and y explicitly above, but I'm unclear as to what they should be, or, perhaps I need to take a little break here?

EmilHvitfeldt commented 3 years ago

Did you compare the package versions you are using compared to the colophon, I can't help debugging without reprexes or session information.

hasandiwan commented 3 years ago

I am just listing the issues I encounter at the moment, will look deeper once I get through everything

EmilHvitfeldt commented 3 years ago

If you are in this project you can run the following and show me the results.

deps <- desc::desc_get_deps()
pkgs <- sort(deps$package[deps$type == "Imports"])
sessioninfo::package_info(pkgs, dependencies = FALSE)
hasandiwan commented 3 years ago

desc::desc_get_deps() Error: No root directory found in /Users/user or its parent directories. Root criterion: contains a file "DESCRIPTION" with contents matching "^Package: "

I'm in emacs using ess, not sure if this makes.a difference. -- H

On Wed, 11 Aug 2021 at 21:29, Emil Hvitfeldt @.***> wrote:

If you are in this project you can run the following and show me the results.

deps <- desc::desc_get_deps()pkgs <- sort(deps$package[deps$type == "Imports"])sessioninfo::package_info(pkgs, dependencies = FALSE)

ā€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EmilHvitfeldt/ISLR-tidymodels-labs/issues/9#issuecomment-897340554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSFUTJQ2JB7FKV2E2CZJITT4NE3JANCNFSM5B7KGDOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- OpenPGP: https://hasan.d8u.us/openpgp.asc If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest. Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest.

https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1Sent from my mobile device Envoye de mon portable

hasandiwan commented 3 years ago
 nci60_pca %>%
  dist() %>%
  hclust() %>%
  fviz_dend(k = 4, main = "hclust on first five PCs")
. + > Error: [conflicted] `dist` found in 2 packages.
Either pick the one you want with `::` 
* proxy::dist
* stats::dist
Or declare a preference with `conflict_prefer()`
* conflict_prefer("dist", "proxy")
* conflict_prefer("dist", "stats")
Run `rlang::last_error()` to see where the error occurred.

Specifying stats::dist sorted this as well.

EmilHvitfeldt commented 3 years ago

You don't need to specify stats::dist as I loaded library(proxy) at the beginning of the chapter.

It appears you are calling library(conflicted) either before running on the code or as part of your .rProfile. Please only supply fixes to errors that appears with an empty .rProfile

hasandiwan commented 3 years ago

The only thing in my .Rprofile is Sys.setenv(PATH=paste(Sys.getenv('PATH'), '/usr/local/bin', sep=':')) -- how can I find out if I've loaded the conflicted library?

EmilHvitfeldt commented 3 years ago

The last error you are seeing is happening because conflicted was loaded. I can't tell where it is loaded for you, but you can unload it with detach("package:conflicted", unload=TRUE). But it will be best if you can make sure it isn't loaded to begin with

hasandiwan commented 3 years ago

detach(conflicted) Error in detach(conflicted) : invalid 'name' argument

According to https://stackoverflow.com/a/28991422/783412, I can use unloadNamespace('conflicted'),which did complete successfully, but I'm still getting the same error:

nci60_pca %>% dist() %>% hclust() %>% fviz_dend(k = 4, main = "hclust on first five PCs") . + > Error: [conflicted] dist found in 2 packages. Either pick the one you want with ::

On Wed, 11 Aug 2021 at 22:06, Emil Hvitfeldt @.***> wrote:

The last error you are seeing is happening because conflicted was loaded. I can't tell where it is loaded for you, but you can unload it with detach("package:conflicted", unload=TRUE). But it will be best if you can make sure it isn't loaded to begin with

ā€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EmilHvitfeldt/ISLR-tidymodels-labs/issues/9#issuecomment-897351628, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSFUTPKFO6TFA5O4VRNFSTT4NJE7ANCNFSM5B7KGDOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- OpenPGP: https://hasan.d8u.us/openpgp.asc If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest. Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest.

https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1Sent from my mobile device Envoye de mon portable

hasandiwan commented 3 years ago
> (.packages())
 [1] "desc"         "patchwork"    "factoextra"   "proxy"        "magrittr"    
 [6] "forcats"      "stringr"      "readr"        "tidyverse"    "kernlab"     
[11] "vip"          "rpart.plot"   "rpart"        "mixOmics"     "lattice"     
[16] "MASS"         "vctrs"        "rlang"        "glmnet"       "Matrix"      
[21] "ISLR"         "yardstick"    "workflowsets" "workflows"    "tune"        
[26] "tidyr"        "tibble"       "rsample"      "recipes"      "purrr"       
[31] "parsnip"      "modeldata"    "infer"        "ggplot2"      "dplyr"       
[36] "dials"        "scales"       "broom"        "tidymodels"   "stats"       
[41] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
[46] "base"        

According to https://rpubs.com/Mentors_Ubiqum/list_packages, that lists packages that are loaded. As you can see, there's no conflicted here.

EmilHvitfeldt commented 3 years ago

That is weird. It might have something to do with your ESS setup, I don't have any experience with it so I sadly can't let you.

Nevertheless, I won't be accepting changes to code that results from conflict or other conditions I can't replicate.

hasandiwan commented 3 years ago

Sure, it shouldn't affect any of your code to namespace stats. Anyway, I'll make sure that every comment has its own commit, and when you merge it, you should feel free to cherry pick whichever ones you like. This is primarily/wholly your work, after all.

On Thu, 12 Aug 2021 at 00:24, Emil Hvitfeldt @.***> wrote:

That is weird. It might have something to do with your ESS setup, I don't have any experience with it so I sadly can't let you.

Nevertheless, I won't be accepting changes to code that results from conflict or other conditions I can't replicate.

ā€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EmilHvitfeldt/ISLR-tidymodels-labs/issues/9#issuecomment-897410351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSFUTLILJNDD5C7WTIQ57DT4NZLJANCNFSM5B7KGDOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- OpenPGP: https://hasan.d8u.us/openpgp.asc If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest. Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest http://bit.ly/hd1AppointmentRequest.

https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1Sent from my mobile device Envoye de mon portable

EmilHvitfeldt commented 3 years ago

Sure, it shouldn't affect any of your code to namespace stats

It does since that chapter uses proxy::dist() not stats::dist().

hasandiwan commented 3 years ago

Emil, [response inline]

On Thu, 12 Aug 2021 at 09:51, Emil Hvitfeldt @.***> wrote:

It does since that chapter uses proxy::dist() not stats::dist().

šŸ‘