JamesHWade / gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code
https://jameshwade.github.io/gpttools/
Other
291 stars 27 forks source link

Error when trying to create embeddings using the crawl() function #38

Closed jobreu closed 1 year ago

jobreu commented 1 year ago

Thanks a lot for this amazing package and continuously developing new cool functions for it!

When I tried to crawl https://adv-r.hadley.nz/ using the code below and confirm that I want to create the embeddings, I get the following error message:

! Duplicate text entries detected.
i These are removed by default.
Error in `mutate_cols()`:
! Problem with `mutate()` column `embeddings`.
i `embeddings = purrr::map(.x = chunks, .f = create_openai_embedding, .progress = "Create Embeddings")`.
x unused argument (.progress = "Create Embeddings")
Caused by error in `.f()`:
! unused argument (.progress = "Create Embeddings")
Run `rlang::last_error()` to see where the error occurred.

Warning messages:
1: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". 
3: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".`

reprex:

library(gpttools)
crawl("https://adv-r.hadley.nz/")

Note: I use R version 4.1.3 and gpttools v 0.0.5.

JamesHWade commented 1 year ago

Ah.... as best I can tell, I need to specify a newer version of purrr. Thanks for reporting the error!

jobreu commented 1 year ago

Sure! Thx for the quick reply! FWIW, my version of purrr is 0.3.4.

JamesHWade commented 1 year ago

I bumped the required purrr to >=1.0.0. Please reopen the issue if that does not fix it.