JesseVent / crypto

Cryptocurrency Historical Market Data R Package
https://CRAN.R-project.org/package=crypto
Other
143 stars 34 forks source link

Subscript out of bounds error #45

Closed realauggieheschmeyer closed 4 years ago

realauggieheschmeyer commented 4 years ago

In the last few days, I've been intermittently getting an error when I run crypto_history.

crypto_history("ABBC")
♥ If this helps you become rich please consider donating

ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860
XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK

❯ Scraping historical crypto data

Error in value[[jvseq[[jjj]]]] : subscript out of bounds

I thought it might have been the workflow I was using this function within (a series of map functions), but as can be seen in the above code, it is within the function itself.

Any ideas what's going on here and what might have changed recently?

realauggieheschmeyer commented 4 years ago

I dug around into the source code for the crypto_history function which led me to the scraper function and I believe that is where the error lies. Within your code, you have:

table   <- rvest::html_nodes(page, css = "table") %>% .[1] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)

I believe the error lies in pulling the first object within the output of html_nodes. It looks like the way that Coin Market Cap is now set up stores the historical data within the third object of html_nodes. When I copied the code and ran the following...

table   <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)

I was able to scrape the data with no problems (minus the fact that I don't have the cool progress bar function). I'll try opening a pull request to get this fixed.

thelmmortal1 commented 4 years ago

I'm going to second this issue on crypto_history. Unlike realauggieschmeyer I am not smart enough to figure out why it is happening, but it is happening for me as well.

I generally run something similar to this code:

crypto_history(start_date = 20190101, limit = 6, sleep = 7.1)

I get the subscript error but I still am able to look at the data frame. A couple things stand out. Now there are 4 additional columns that are simply counting the rows and it doesn't correctly limit the pull to the top 6 market caps.

realauggieheschmeyer commented 4 years ago

@thelmmortal1, thanks for the kind words!

I haven't been able to open a PR to fix this issue yet, but I'm going to post the code I adapted that has been working for me. If you copy these into your script and have them override the original crypto functions, then it should be functional. It is for me at least ¯\(ツ)

scraper <- function(attributes, slug, sleep = NULL) {
  .            <- "."
  history_url  <- as.character(attributes)
  coin_slug    <- as.character(slug)
  if (!is.null(sleep)) Sys.sleep(sleep)

  page <- tryCatch(
    xml2::read_html(history_url,
                    handle = curl::new_handle("useragent" = "Mozilla/5.0")),
    error = function(e) e)

  if (inherits(page, "error")) {
    closeAllConnections()
    message("\n")
    message(cli::cat_bullet("Rate limit hit. Sleeping for 60 seconds.", bullet = "warning", bullet_col = "red"), appendLF = TRUE)
    Sys.sleep(65)
    page <- xml2::read_html(history_url,
                            handle = curl::new_handle("useragent" = "Mozilla/5.0"))
  }

  table   <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)

  scraper <- table[[1]] %>% tibble::as.tibble() %>%
    dplyr::mutate(slug = coin_slug)

  return(scraper)
}
crypto_list <- function(coin = NULL,
                        start_date = NULL,
                        end_date = NULL,
                        coin_list = NULL) {
  if (is.null(coin_list)) {
    json   <- "https://s2.coinmarketcap.com/generated/search/quick_search.json"
    coins  <- jsonlite::fromJSON(json)
  } else {
    ifelse(coin_list == "api",
           coins <- get_coinlist_api(),
           coins <- get_coinlist_static())
  }

  if (!is.null(coin)) {
    name   <- coins$name
    slug   <- coins$slug
    symbol <- coins$symbol
    c1     <- subset(coins, toupper(name) %in% toupper(coin))
    c2     <- subset(coins, symbol %in% toupper(coin))
    c3     <- subset(coins, slug %in% tolower(coin))
    coins  <- tibble::tibble()
    if (nrow(c1) > 0) { coins     <- rbind(coins, c1) }
    if (nrow(c2) > 0) { coins     <- rbind(coins, c2) }
    if (nrow(c3) > 0) { coins     <- rbind(coins, c3) }
    if (nrow(coins) > 1L) { coins <- unique(coins) }
  }
  coins <-
    tibble::tibble(
      symbol = coins$symbol,
      name   = coins$name,
      slug   = coins$slug,
      rank   = coins$rank
    )
  if (is.null(start_date)) { start_date <- "20130428" }
  if (is.null(end_date)) { end_date <- gsub("-", "", lubridate::today()) }
  exchangeurl <- paste0("https://coinmarketcap.com/currencies/", coins$slug, "/#markets")
  historyurl <-
    paste0(
      "https://coinmarketcap.com/currencies/",
      coins$slug,
      "/historical-data/?start=",
      start_date,
      "&end=",
      end_date
    )
  exchange_url       <- c(exchangeurl)
  history_url        <- c(historyurl)
  coins$symbol       <- as.character(toupper(coins$symbol))
  coins$name         <- as.character(coins$name)
  coins$slug         <- as.character(coins$slug)
  coins$exchange_url <- as.character(exchange_url)
  coins$history_url  <- as.character(history_url)
  coins$rank         <- as.numeric(coins$rank)
  return(coins)
}
crypto_history <- function(coin = NULL, limit = NULL, start_date = NULL, end_date = NULL,
                           coin_list = NULL, sleep = NULL) {
  pink <- crayon::make_style(grDevices::rgb(0.93, 0.19, 0.65))
  options(scipen = 999)
  i <- "i"
  low <- NULL
  high <- NULL
  close <- NULL
  ranknow <- NULL

  message(cli::cat_bullet("If this helps you become rich please consider donating",
                          bullet = "heart", bullet_col = pink))
  message("ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860", appendLF = TRUE)
  message("XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK", appendLF = TRUE)
  message("\n")

  coins <- crypto_list(coin, start_date, end_date, coin_list)

  if (!is.null(limit))
    coins <- coins[1:limit, ]

  coin_names <- tibble::tibble(symbol = coins$symbol, name = coins$name, rank = coins$rank,
                               slug = coins$slug)
  to_scrape <- tibble::tibble(attributes = coins$history_url, slug = coins$slug)
  loop_data <- vector("list", nrow(to_scrape))

  message(cli::cat_bullet("Scraping historical crypto data", bullet = "pointer",
                          bullet_col = "green"))

  for (i in seq_len(nrow(to_scrape))) {
    loop_data[[i]] <- scraper(to_scrape$attributes[i], to_scrape$slug[i], sleep)
  }

  results <- do.call(rbind, loop_data) %>% tibble::as.tibble()

  if (length(results) == 0L)
    stop("No data currently exists for this crypto currency.", call. = FALSE)

  market_data <- merge(results, coin_names, by = "slug")
  colnames(market_data) <- c("slug", "date", "open", "high", "low", "close", "volume",
                             "market", "symbol", "name", "ranknow")
  market_data <- market_data[c("slug", "symbol", "name", "date", "ranknow", "open",
                               "high", "low", "close", "volume", "market")]
  market_data$date <- lubridate::mdy(market_data$date, locale = platform_locale())

  market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) gsub(",", "",
                                                                        x))
  market_data[, 7:11] <- apply(market_data[, 7:11], 2, function(x) gsub("-", "0",
                                                                        x))
  market_data$volume <- market_data$volume %>% tidyr::replace_na(0) %>% as.numeric()
  market_data$market <- market_data$market %>% tidyr::replace_na(0) %>% as.numeric()
  market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) as.numeric(x))
  market_data <- na.omit(market_data)

  market_data <- market_data %>% dplyr::mutate(close_ratio = (close - low)/(high -
                                                                              low) %>% round(4) %>% as.numeric(), spread = (high - low) %>% round(2) %>%
                                                 as.numeric())

  market_data$close_ratio <- market_data$close_ratio %>% tidyr::replace_na(0)
  history_results <- market_data %>% dplyr::arrange(ranknow, date)
  return(history_results)
}
JesseVent commented 4 years ago

Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.

It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest

thelmmortal1 commented 4 years ago

Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.

It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest

It seems to work if I'm running it only for a limited number of coins. When I run it for a broader set of coins it still fails

crypto_history(start_date = 20190101, limit = 600, sleep = 7.1) ♥ If this helps you become rich please consider donating

ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860 XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK

Scraping historical crypto data

| [332 / 600] [=================================================================>-----------------------------------------------------] 55% in 00:47:07 ETA: 38mError in result[[1]] : subscript out of bound

realauggieheschmeyer commented 4 years ago

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.

Thanks for fixing this issue, @JesseVent!

thelmmortal1 commented 4 years ago

I set my limit at 100 and it worked. But when I bumped it back up to 600 it failed again.

On Nov 18, 2019, at 3:54 PM, Auggie Heschmeyer notifications@github.com wrote:

 It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.

Thanks for fixing this issue, @JesseVent!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

thelmmortal1 commented 4 years ago

I'm still getting the same error. Any ideas?

thelmmortal1 commented 4 years ago

crypto_history(start_date = 20190101, limit = 600, sleep = 7.1) ♥ If this helps you become rich please consider donating

ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860 XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK

Scraping historical crypto data

| [316 / 600] [==============================================================>--------------------------------------------------------] 53% in 00:44:58 ETA: 40mError in result[[1]] : subscript out of bounds

dmrodz commented 4 years ago

Having this same issue, intermittently. Any updates? Thanks!

thelmmortal1 commented 4 years ago

I'm still getting the same error. Any potential fixes out there?

JesseVent commented 4 years ago

I'm about to commit a fix for something else, but the only thing I could think of without being able to reproduce the issue is remove the start_date argument in your function call. It should be more reliable to retrieve all the rows for the coin (hence populating the table) rather than limiting it to a specific date and then you can filter out the rows you don't need, as opposed to getting the web service to apply the filtering.

Only an idea - not tested or verified.

thelmmortal1 commented 4 years ago

Thanks for the suggestion.

I ran it without the date and still got the subscript out of bounds error. The following is what I used.

crypto_history(limit = 600, sleep = 7.5)

I get the error around the ~160th coin

dukes00 commented 4 years ago

Hi, I'm having the same error at exactly the 160th coin just as @thelmmortal1 mentioned. Any updates?

MarkYueMa commented 4 years ago

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.

Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

MarkYueMa commented 4 years ago

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1. Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

and how big was the final dataset? Thanks! @realauggieheschmeyer

dukes00 commented 4 years ago

@MarkYueMa I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()

MarkYueMa commented 4 years ago

That’s good to know, thank you very much. Your number of coins is close to the currently trading currencies. I am guessing those are the ones with less probabilities of having issues?

Yue (Mark) Ma University of Oklahoma Price College of Business


From: Daniel Cupriak notifications@github.com Sent: Friday, February 7, 2020 5:43:25 PM To: JesseVent/crypto crypto@noreply.github.com Cc: Ma, Yue markyuema@ou.edu; Mention mention@noreply.github.com Subject: Re: [JesseVent/crypto] Subscript out of bounds error (#45)

@MarkYueMahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarkYueMa&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=zXpGuEvzEMrPgNVQH0ginEU6iNFGUbt7HkK4SqeCG1I&e= I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JesseVent_crypto_issues_45-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DALOWNHPPXE3QPPV7FBOUDSLRBXWZ3A5CNFSM4JKPPQWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELE7YGQ-23issuecomment-2D583662618&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=vUn_hz1NM1-mJ8GztKm0x_oGyqShJSmipipfDTmS4GY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALOWNHP5WIX6U3TMRJ4PBNTRBXWZ3ANCNFSM4JKPPQWA&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=7Z7ERby5Rog21KCV0yorEEUtvcmuXgpwx-N37blfk4A&e=.

dukes00 commented 4 years ago

I suppose so, all the "major" ones were downloaded successfully.

thelmmortal1 commented 4 years ago

I usually use limit, to somewhere between 400-600 coins. I sleep it for over 7, so it depends how long I want to wait. But regardless it breaks around the same number every time like Daniel said

Sent from my iPhone

On Feb 7, 2020, at 3:59 PM, MarkYueMa notifications@github.com wrote:

That’s good to know, thank you very much. Your number of coins is close to the currently trading currencies. I am guessing those are the ones with less probabilities of having issues?

Yue (Mark) Ma University of Oklahoma Price College of Business


From: Daniel Cupriak notifications@github.com Sent: Friday, February 7, 2020 5:43:25 PM To: JesseVent/crypto crypto@noreply.github.com Cc: Ma, Yue markyuema@ou.edu; Mention mention@noreply.github.com Subject: Re: [JesseVent/crypto] Subscript out of bounds error (#45)

@MarkYueMahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarkYueMa&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=zXpGuEvzEMrPgNVQH0ginEU6iNFGUbt7HkK4SqeCG1I&e= I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JesseVent_crypto_issues_45-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DALOWNHPPXE3QPPV7FBOUDSLRBXWZ3A5CNFSM4JKPPQWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELE7YGQ-23issuecomment-2D583662618&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=vUn_hz1NM1-mJ8GztKm0x_oGyqShJSmipipfDTmS4GY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALOWNHP5WIX6U3TMRJ4PBNTRBXWZ3ANCNFSM4JKPPQWA&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=7Z7ERby5Rog21KCV0yorEEUtvcmuXgpwx-N37blfk4A&e=. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

realauggieheschmeyer commented 4 years ago

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1. Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

Hey @MarkYueMa. I only had about 15 or so coins in my map. I use Coinbase as my crypto trading platform and they only have so many tradable currencies. I didn't change the sleep time, but when I hit about 10 queries, the query puts itself to sleep for 60 seconds. As for furrr, I didn't feel it was necessary for this particular request as there were only a small number of currencies. If I was doing 1500 currency requests, then I would definitely think about parallel processing that request.

neelanjanghosh commented 3 years ago

Hey @JesseVent , I am still getting this issue on running.

x = crypto_history("DOT",start_date = 20200101,limit = 100,sleep = 7.1)

image

alienalex6 commented 3 years ago

Hey @JesseVent I'm having the same issue as @neelanjanghosh.

probably another change to CoinMarketCap structure? It would be awesome if you could help out. This package has helped a lot!

demirelesad commented 3 years ago

Is it necessary to switch to the pro plan to access historical data? Do you guys know another option ? Thanks