business-science / riingo

An R interface to the Tiingo stock price API
https://business-science.github.io/riingo/
Other
51 stars 9 forks source link

Wrongly attributes Monday Quotes to Sunday date #23

Closed knowcell closed 2 years ago

knowcell commented 2 years ago

For example, 2022-07-10 to 2022-07-19 download for SPY gives:

5918 | SPY | 2022-07-10 | 384.23 | 386.8700 | 383.5000 | 385.850 | 58366945 | 384.2300 | 386.8700 | 383.5000 | 385.8500 | 5919 | SPY | 2022-07-11 | 380.83 | 386.1600 | 378.9900 | 383.650 | 62219178 | 380.8300 | 386.1600 | 378.9900 | 383.6500 | 5920 | SPY | 2022-07-12 | 378.83 | 381.9200 | 374.6580 | 375.100 | 84224649 | 378.8300 | 381.9200 | 374.6580 | 375.1000 | 5921 | SPY | 2022-07-13 | 377.91 | 379.0498 | 371.0400 | 373.610 | 89704819 | 377.9100 | 379.0498 | 371.0400 | 373.6100 | 5922 | SPY | 2022-07-14 | 385.13 | 385.2500 | 380.5400 | 382.550 | 79060383 | 385.1300 | 385.2500 | 380.5400 | 382.5500 | 5923 | SPY | 2022-07-17 | 381.95 | 389.0900 | 380.6600 | 388.380 | 63203626 | 381.9500 | 389.0900 | 380.6600 | 388.3800 | 5924 | SPY | 2022-07-18 | 392.27 | 392.8700 | 385.3900 | 386.080 | 77242177 | 392.2700 | 392.8700 | 385.3900 | 386.0800 |

The order of the quotes itself is correct but the dates are wrong and offset by 1 day. The dates end on 07-18 while the last quote is actually for 07-19. Also, 07-17 listed quote is for 07-18, and 07-17 is a Sunday. This is possibly a duplicate of the open issue, although I am not sure if it is the same. It will be great if you can fix it.

DavisVaughan commented 2 years ago

This is what I see

library(riingo)

spy <- riingo_prices(
  "SPY",
  start_date = "2022-07-10",
  end_date = "2022-07-19"
)
spy
#> # A tibble: 7 × 14
#>   ticker date                close  high   low  open   volume adjClose adjHigh
#>   <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>    <int>    <dbl>   <dbl>
#> 1 SPY    2022-07-11 00:00:00  384.  387.  384.  386. 58366945     384.    387.
#> 2 SPY    2022-07-12 00:00:00  381.  386.  379.  384. 62219178     381.    386.
#> 3 SPY    2022-07-13 00:00:00  379.  382.  375.  375. 84224649     379.    382.
#> 4 SPY    2022-07-14 00:00:00  378.  379.  371.  374. 89704819     378.    379.
#> 5 SPY    2022-07-15 00:00:00  385.  385.  381.  383. 79060383     385.    385.
#> 6 SPY    2022-07-18 00:00:00  382.  389.  381.  388. 63203626     382.    389.
#> 7 SPY    2022-07-19 00:00:00  392.  393.  385.  386. 78505972     392.    393.
#> # … with 5 more variables: adjLow <dbl>, adjOpen <dbl>, adjVolume <int>,
#> #   divCash <dbl>, splitFactor <dbl>
#> # ℹ Use `colnames()` to see all variable names

So I don't think I see a problem?

Please try to use the reprex package to demonstrate your issue https://www.tidyverse.org/help/

knowcell commented 2 years ago

I am sorry, the error was on my part. It was getting the data OK. I think I set the time zone wrong when I used the as.Date function for merging the dowloaded data with another data frame (I was trying to set it to eastern standard time and used tz = 'est'). This messed up the date column of the correctly nldownaded data. I pasted the output from reprex to show where I got thrown off. As an aside, it will be great if 2d, 3d, 4d resampling can be built into riingo. Tiingo doesn't yet provide it. Thanks again for your work with the package!

library(riingo)
spy1d <- riingo_prices("SPY", start_date = Sys.Date() - 15, end_date = Sys.Date(), resample_frequency = "daily")
spy1d
#> # A tibble: 11 × 14
#>    ticker date                close  high   low  open   volume adjClose adjHigh
#>    <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>    <int>    <dbl>   <dbl>
#>  1 SPY    2022-07-06 00:00:00  383.  386.  380.  382. 70426244     383.    386.
#>  2 SPY    2022-07-07 00:00:00  389.  390.  383.  385. 64525919     389.    390.
#>  3 SPY    2022-07-08 00:00:00  389.  391.  386.  387. 72397765     389.    391.
#>  4 SPY    2022-07-11 00:00:00  384.  387.  384.  386. 58366945     384.    387.
#>  5 SPY    2022-07-12 00:00:00  381.  386.  379.  384. 62219178     381.    386.
#>  6 SPY    2022-07-13 00:00:00  379.  382.  375.  375. 84224649     379.    382.
#>  7 SPY    2022-07-14 00:00:00  378.  379.  371.  374. 89704819     378.    379.
#>  8 SPY    2022-07-15 00:00:00  385.  385.  381.  383. 79060383     385.    385.
#>  9 SPY    2022-07-18 00:00:00  382.  389.  381.  388. 63203626     382.    389.
#> 10 SPY    2022-07-19 00:00:00  392.  393.  385.  386. 78505972     392.    393.
#> 11 SPY    2022-07-20 00:00:00  395.  396.  391.  392. 71843769     395.    396.
#> # … with 5 more variables: adjLow <dbl>, adjOpen <dbl>, adjVolume <int>,
#> #   divCash <dbl>, splitFactor <dbl>
spy1d$date = as.Date(spy1d$date, tz = "est")
spy1d
#> # A tibble: 11 × 14
#>    ticker date       close  high   low  open   volume adjClose adjHigh adjLow
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <int>    <dbl>   <dbl>  <dbl>
#>  1 SPY    2022-07-05  383.  386.  380.  382. 70426244     383.    386.   380.
#>  2 SPY    2022-07-06  389.  390.  383.  385. 64525919     389.    390.   383.
#>  3 SPY    2022-07-07  389.  391.  386.  387. 72397765     389.    391.   386.
#>  4 SPY    2022-07-10  384.  387.  384.  386. 58366945     384.    387.   384.
#>  5 SPY    2022-07-11  381.  386.  379.  384. 62219178     381.    386.   379.
#>  6 SPY    2022-07-12  379.  382.  375.  375. 84224649     379.    382.   375.
#>  7 SPY    2022-07-13  378.  379.  371.  374. 89704819     378.    379.   371.
#>  8 SPY    2022-07-14  385.  385.  381.  383. 79060383     385.    385.   381.
#>  9 SPY    2022-07-17  382.  389.  381.  388. 63203626     382.    389.   381.
#> 10 SPY    2022-07-18  392.  393.  385.  386. 78505972     392.    393.   385.
#> 11 SPY    2022-07-19  395.  396.  391.  392. 71843769     395.    396.   391.
#> # … with 4 more variables: adjOpen <dbl>, adjVolume <int>, divCash <dbl>,
#> #   splitFactor <dbl>

Created on 2022-07-21 by the reprex package (v2.0.1)

DavisVaughan commented 2 years ago

The dates come from Tiingo in UTC, so your code is essentially doing something like this:

x <- as.POSIXct("2022-07-06 00:00:00", tz = "UTC")
x
#> [1] "2022-07-06 UTC"

# This is weird...right?
as.Date(x, tz = "America/New_York")
#> [1] "2022-07-05"

# ^ This code is basically doing this:
x <- as.POSIXlt(x, tz = "America/New_York")
x
#> [1] "2022-07-05 20:00:00 EDT"

# Then converting that to Date
as.Date(x)
#> [1] "2022-07-05"

BTW, if you are on the east coast of the US, you should be using America/New_York not EST for your time zone. For example, New York is currently in Eastern Daylight Time, but using the EST zone won't reflect that correctly

x <- as.POSIXct("2022-07-06 00:00:00", tz = "EST")
x
#> [1] "2022-07-06 EST"

x <- as.POSIXct("2022-07-06 00:00:00", tz = "America/New_York")
x
#> [1] "2022-07-06 EDT"

If you are able to do so, I'd look into using clock::as_date() for date / date-time conversions. I designed it to be a "what you see is what you get" kind of converter: https://clock.r-lib.org/reference/as_date.html

x <- as.POSIXct("2022-07-06 00:00:00", tz = "EST")
x
#> [1] "2022-07-06 EST"

clock::as_date(x)
#> [1] "2022-07-06"