Closed realauggieheschmeyer closed 4 years ago
I dug around into the source code for the crypto_history
function which led me to the scraper
function and I believe that is where the error lies. Within your code, you have:
table <- rvest::html_nodes(page, css = "table") %>% .[1] %>%
rvest::html_table(fill = TRUE) %>%
replace(!nzchar(.), NA)
I believe the error lies in pulling the first object within the output of html_nodes
. It looks like the way that Coin Market Cap is now set up stores the historical data within the third object of html_nodes
. When I copied the code and ran the following...
table <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
rvest::html_table(fill = TRUE) %>%
replace(!nzchar(.), NA)
I was able to scrape the data with no problems (minus the fact that I don't have the cool progress bar function). I'll try opening a pull request to get this fixed.
I'm going to second this issue on crypto_history. Unlike realauggieschmeyer I am not smart enough to figure out why it is happening, but it is happening for me as well.
I generally run something similar to this code:
crypto_history(start_date = 20190101, limit = 6, sleep = 7.1)
I get the subscript error but I still am able to look at the data frame. A couple things stand out. Now there are 4 additional columns that are simply counting the rows and it doesn't correctly limit the pull to the top 6 market caps.
@thelmmortal1, thanks for the kind words!
I haven't been able to open a PR to fix this issue yet, but I'm going to post the code I adapted that has been working for me. If you copy these into your script and have them override the original crypto
functions, then it should be functional. It is for me at least ¯\(ツ)/¯
scraper <- function(attributes, slug, sleep = NULL) {
. <- "."
history_url <- as.character(attributes)
coin_slug <- as.character(slug)
if (!is.null(sleep)) Sys.sleep(sleep)
page <- tryCatch(
xml2::read_html(history_url,
handle = curl::new_handle("useragent" = "Mozilla/5.0")),
error = function(e) e)
if (inherits(page, "error")) {
closeAllConnections()
message("\n")
message(cli::cat_bullet("Rate limit hit. Sleeping for 60 seconds.", bullet = "warning", bullet_col = "red"), appendLF = TRUE)
Sys.sleep(65)
page <- xml2::read_html(history_url,
handle = curl::new_handle("useragent" = "Mozilla/5.0"))
}
table <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
rvest::html_table(fill = TRUE) %>%
replace(!nzchar(.), NA)
scraper <- table[[1]] %>% tibble::as.tibble() %>%
dplyr::mutate(slug = coin_slug)
return(scraper)
}
crypto_list <- function(coin = NULL,
start_date = NULL,
end_date = NULL,
coin_list = NULL) {
if (is.null(coin_list)) {
json <- "https://s2.coinmarketcap.com/generated/search/quick_search.json"
coins <- jsonlite::fromJSON(json)
} else {
ifelse(coin_list == "api",
coins <- get_coinlist_api(),
coins <- get_coinlist_static())
}
if (!is.null(coin)) {
name <- coins$name
slug <- coins$slug
symbol <- coins$symbol
c1 <- subset(coins, toupper(name) %in% toupper(coin))
c2 <- subset(coins, symbol %in% toupper(coin))
c3 <- subset(coins, slug %in% tolower(coin))
coins <- tibble::tibble()
if (nrow(c1) > 0) { coins <- rbind(coins, c1) }
if (nrow(c2) > 0) { coins <- rbind(coins, c2) }
if (nrow(c3) > 0) { coins <- rbind(coins, c3) }
if (nrow(coins) > 1L) { coins <- unique(coins) }
}
coins <-
tibble::tibble(
symbol = coins$symbol,
name = coins$name,
slug = coins$slug,
rank = coins$rank
)
if (is.null(start_date)) { start_date <- "20130428" }
if (is.null(end_date)) { end_date <- gsub("-", "", lubridate::today()) }
exchangeurl <- paste0("https://coinmarketcap.com/currencies/", coins$slug, "/#markets")
historyurl <-
paste0(
"https://coinmarketcap.com/currencies/",
coins$slug,
"/historical-data/?start=",
start_date,
"&end=",
end_date
)
exchange_url <- c(exchangeurl)
history_url <- c(historyurl)
coins$symbol <- as.character(toupper(coins$symbol))
coins$name <- as.character(coins$name)
coins$slug <- as.character(coins$slug)
coins$exchange_url <- as.character(exchange_url)
coins$history_url <- as.character(history_url)
coins$rank <- as.numeric(coins$rank)
return(coins)
}
crypto_history <- function(coin = NULL, limit = NULL, start_date = NULL, end_date = NULL,
coin_list = NULL, sleep = NULL) {
pink <- crayon::make_style(grDevices::rgb(0.93, 0.19, 0.65))
options(scipen = 999)
i <- "i"
low <- NULL
high <- NULL
close <- NULL
ranknow <- NULL
message(cli::cat_bullet("If this helps you become rich please consider donating",
bullet = "heart", bullet_col = pink))
message("ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860", appendLF = TRUE)
message("XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK", appendLF = TRUE)
message("\n")
coins <- crypto_list(coin, start_date, end_date, coin_list)
if (!is.null(limit))
coins <- coins[1:limit, ]
coin_names <- tibble::tibble(symbol = coins$symbol, name = coins$name, rank = coins$rank,
slug = coins$slug)
to_scrape <- tibble::tibble(attributes = coins$history_url, slug = coins$slug)
loop_data <- vector("list", nrow(to_scrape))
message(cli::cat_bullet("Scraping historical crypto data", bullet = "pointer",
bullet_col = "green"))
for (i in seq_len(nrow(to_scrape))) {
loop_data[[i]] <- scraper(to_scrape$attributes[i], to_scrape$slug[i], sleep)
}
results <- do.call(rbind, loop_data) %>% tibble::as.tibble()
if (length(results) == 0L)
stop("No data currently exists for this crypto currency.", call. = FALSE)
market_data <- merge(results, coin_names, by = "slug")
colnames(market_data) <- c("slug", "date", "open", "high", "low", "close", "volume",
"market", "symbol", "name", "ranknow")
market_data <- market_data[c("slug", "symbol", "name", "date", "ranknow", "open",
"high", "low", "close", "volume", "market")]
market_data$date <- lubridate::mdy(market_data$date, locale = platform_locale())
market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) gsub(",", "",
x))
market_data[, 7:11] <- apply(market_data[, 7:11], 2, function(x) gsub("-", "0",
x))
market_data$volume <- market_data$volume %>% tidyr::replace_na(0) %>% as.numeric()
market_data$market <- market_data$market %>% tidyr::replace_na(0) %>% as.numeric()
market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) as.numeric(x))
market_data <- na.omit(market_data)
market_data <- market_data %>% dplyr::mutate(close_ratio = (close - low)/(high -
low) %>% round(4) %>% as.numeric(), spread = (high - low) %>% round(2) %>%
as.numeric())
market_data$close_ratio <- market_data$close_ratio %>% tidyr::replace_na(0)
history_results <- market_data %>% dplyr::arrange(ranknow, date)
return(history_results)
}
Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.
It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest
Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.
It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest
It seems to work if I'm running it only for a limited number of coins. When I run it for a broader set of coins it still fails
crypto_history(start_date = 20190101, limit = 600, sleep = 7.1) ♥ If this helps you become rich please consider donating
ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860 XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK
Scraping historical crypto data
| [332 / 600] [=================================================================>-----------------------------------------------------] 55% in 00:47:07 ETA: 38mError in result[[1]] : subscript out of bound
It worked for me when I tried it. However, I mapped crypto_history
to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.
Thanks for fixing this issue, @JesseVent!
I set my limit at 100 and it worked. But when I bumped it back up to 600 it failed again.
On Nov 18, 2019, at 3:54 PM, Auggie Heschmeyer notifications@github.com wrote:
It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.
Thanks for fixing this issue, @JesseVent!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I'm still getting the same error. Any ideas?
crypto_history(start_date = 20190101, limit = 600, sleep = 7.1) ♥ If this helps you become rich please consider donating
ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860 XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK
Scraping historical crypto data
| [316 / 600] [==============================================================>--------------------------------------------------------] 53% in 00:44:58 ETA: 40mError in result[[1]] : subscript out of bounds
Having this same issue, intermittently. Any updates? Thanks!
I'm still getting the same error. Any potential fixes out there?
I'm about to commit a fix for something else, but the only thing I could think of without being able to reproduce the issue is remove the start_date argument in your function call. It should be more reliable to retrieve all the rows for the coin (hence populating the table) rather than limiting it to a specific date and then you can filter out the rows you don't need, as opposed to getting the web service to apply the filtering.
Only an idea - not tested or verified.
Thanks for the suggestion.
I ran it without the date and still got the subscript out of bounds error. The following is what I used.
crypto_history(limit = 600, sleep = 7.5)
I get the error around the ~160th coin
Hi, I'm having the same error at exactly the 160th coin just as @thelmmortal1 mentioned. Any updates?
It worked for me when I tried it. However, I mapped
crypto_history
to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.Thanks for fixing this issue, @JesseVent!
could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.
It worked for me when I tried it. However, I mapped
crypto_history
to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1. Thanks for fixing this issue, @JesseVent!could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.
and how big was the final dataset? Thanks! @realauggieheschmeyer
@MarkYueMa I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()
That’s good to know, thank you very much. Your number of coins is close to the currently trading currencies. I am guessing those are the ones with less probabilities of having issues?
Yue (Mark) Ma University of Oklahoma Price College of Business
From: Daniel Cupriak notifications@github.com Sent: Friday, February 7, 2020 5:43:25 PM To: JesseVent/crypto crypto@noreply.github.com Cc: Ma, Yue markyuema@ou.edu; Mention mention@noreply.github.com Subject: Re: [JesseVent/crypto] Subscript out of bounds error (#45)
@MarkYueMahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarkYueMa&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=zXpGuEvzEMrPgNVQH0ginEU6iNFGUbt7HkK4SqeCG1I&e= I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JesseVent_crypto_issues_45-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DALOWNHPPXE3QPPV7FBOUDSLRBXWZ3A5CNFSM4JKPPQWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELE7YGQ-23issuecomment-2D583662618&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=vUn_hz1NM1-mJ8GztKm0x_oGyqShJSmipipfDTmS4GY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALOWNHP5WIX6U3TMRJ4PBNTRBXWZ3ANCNFSM4JKPPQWA&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=7Z7ERby5Rog21KCV0yorEEUtvcmuXgpwx-N37blfk4A&e=.
I suppose so, all the "major" ones were downloaded successfully.
I usually use limit, to somewhere between 400-600 coins. I sleep it for over 7, so it depends how long I want to wait. But regardless it breaks around the same number every time like Daniel said
Sent from my iPhone
On Feb 7, 2020, at 3:59 PM, MarkYueMa notifications@github.com wrote:
That’s good to know, thank you very much. Your number of coins is close to the currently trading currencies. I am guessing those are the ones with less probabilities of having issues?
Yue (Mark) Ma University of Oklahoma Price College of Business
From: Daniel Cupriak notifications@github.com Sent: Friday, February 7, 2020 5:43:25 PM To: JesseVent/crypto crypto@noreply.github.com Cc: Ma, Yue markyuema@ou.edu; Mention mention@noreply.github.com Subject: Re: [JesseVent/crypto] Subscript out of bounds error (#45)
@MarkYueMahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarkYueMa&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=zXpGuEvzEMrPgNVQH0ginEU6iNFGUbt7HkK4SqeCG1I&e= I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JesseVent_crypto_issues_45-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DALOWNHPPXE3QPPV7FBOUDSLRBXWZ3A5CNFSM4JKPPQWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELE7YGQ-23issuecomment-2D583662618&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=vUn_hz1NM1-mJ8GztKm0x_oGyqShJSmipipfDTmS4GY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALOWNHP5WIX6U3TMRJ4PBNTRBXWZ3ANCNFSM4JKPPQWA&d=DwMCaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=PCn8XkVNomZrXuM99ryJpA&m=DsaynDM5Rg8MPAOwxj9cca_xQlvs3gq6SsB0CqJOXo8&s=7Z7ERby5Rog21KCV0yorEEUtvcmuXgpwx-N37blfk4A&e=. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
It worked for me when I tried it. However, I mapped
crypto_history
to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1. Thanks for fixing this issue, @JesseVent!could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.
Hey @MarkYueMa. I only had about 15 or so coins in my map. I use Coinbase as my crypto trading platform and they only have so many tradable currencies. I didn't change the sleep time, but when I hit about 10 queries, the query puts itself to sleep for 60 seconds. As for furrr
, I didn't feel it was necessary for this particular request as there were only a small number of currencies. If I was doing 1500 currency requests, then I would definitely think about parallel processing that request.
Hey @JesseVent , I am still getting this issue on running.
x = crypto_history("DOT",start_date = 20200101,limit = 100,sleep = 7.1)
Hey @JesseVent I'm having the same issue as @neelanjanghosh.
probably another change to CoinMarketCap structure? It would be awesome if you could help out. This package has helped a lot!
Is it necessary to switch to the pro plan to access historical data? Do you guys know another option ? Thanks
In the last few days, I've been intermittently getting an error when I run
crypto_history
.I thought it might have been the workflow I was using this function within (a series of
map
functions), but as can be seen in the above code, it is within the function itself.Any ideas what's going on here and what might have changed recently?