dvmlls / bdscale

Remove Weekends and Holidays From ggplot2 Axes
9 stars 0 forks source link

scale_x_bd seems to break geom_rect #10

Open the-tourist- opened 8 years ago

the-tourist- commented 8 years ago

I am considering trying to write a package of geoms and stats that would reproduce the quantmod charts in ggplot. One issue is that ggplot treats all dates as equal, where-as it is normal when looking at price charts to ignore weekends and holidays. bdscale offers an elegant solution to this. But the problem is it seems to break geom_rect, which is needed if I want to create a candlestick geom type. I'm not sure why this happens, maybe because geom_rect doesn't use x, instead xmin and xmax? In the following example p1 prints as expected, but with weekends as spaces. p2 which uses bdscale doesn't print, with a warning "Removed 73 rows containing missing values (geom_rect)"

library(quantmod)
library(ggplot2)
library(bdscale)

getSymbols("AMZN")
AMZN <- adjustOHLC(AMZN)
colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data)
BDDates <- bd2t(index(Data), DateRange)

p1 <- 
  ggplot(Data) + 
  geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) +
  scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen"))
print(p1)

p2 <- 
  ggplot(Data) + 
  geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) +
  scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) +
  scale_x_bd(business.dates = index(Data))
print(p2)
dvmlls commented 8 years ago

Hey I can look a bit more when I'm at a computer later, but have you tried this?

http://stackoverflow.com/a/32964192/908042

The code for bdscale doesn't handle fractional days, which makes it tricky to do things like ohlc elegantly. Boxplot works fine, but segment and apparently rect do not. The last time I looked into this, tried to reverse engineer boxplot but didn't get anywhere in the hour or two I was willing to sink into it.

Anyway, thanks for using the package! On Apr 19, 2016 8:14 AM, "Graeme" notifications@github.com wrote:

I am considering trying to write a package of geoms and stats that would reproduce the quantmod charts in ggplot. One issue is that ggplot treats all dates as equal, where-as it is normal when looking at price charts to ignore weekends and holidays. bdscale offers an elegant solution to this. But the problem is it seems to break geom_rect, which is needed if I want to create a candlestick geom type. I'm not sure why this happens, maybe because geom_rect doesn't use x, instead xmin and xmax? In the following example p1 prints as expected, but with weekends as spaces. p2 which uses bdscale doesn't print, with a warning "Removed 73 rows containing missing values (geom_rect)"

`library(quantmod) library(ggplot2) library(bdscale)

getSymbols("AMZN") AMZN <- adjustOHLC(AMZN) colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data) BDDates <- bd2t(index(Data), DateRange)

p1 <- ggplot(Data) + geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) + scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) print(p1)

p2 <- ggplot(Data) + geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) + scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) + scale_x_bd(business.dates = index(Data)) print(p2) `

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/dvmlls/bdscale/issues/10

the-tourist- commented 8 years ago

Hi,

Thanks for the quick reply.

I'll have a look at boxplot later, your example seemed to work quite well. I hadn't actually thought of using it, my first instinct was to use the rect and linerange geoms to create the candlesticks which worked well but not with bdscale.

I'd say not being able to work with bdscale would pretty much make any quant geom fairly useless, or at least unused. So I really appreciated your quick response.

On an aside, can I ask you a lazy question. It's one of those questions that I probably should fact-check before sounding stupid, but I'm far to lazy to do that. One minor thing I noticed in bdscales is that, although it intelligently chose the major date breaks, it didn't format them as ggplot does by default. But when I looked at the code on github, the date formatting was coded. Did this not get implemented, or am I asking a stupid lazy question?

Cheers,

Graeme

Sent from my iPad

On 19 Apr 2016, at 14:59, dvmlls notifications@github.com wrote:

Hey I can look a bit more when I'm at a computer later, but have you tried this?

http://stackoverflow.com/a/32964192/908042

The code for bdscale doesn't handle fractional days, which makes it tricky to do things like ohlc elegantly. Boxplot works fine, but segment and apparently rect do not. The last time I looked into this, tried to reverse engineer boxplot but didn't get anywhere in the hour or two I was willing to sink into it.

Anyway, thanks for using the package! On Apr 19, 2016 8:14 AM, "Graeme" notifications@github.com wrote:

I am considering trying to write a package of geoms and stats that would reproduce the quantmod charts in ggplot. One issue is that ggplot treats all dates as equal, where-as it is normal when looking at price charts to ignore weekends and holidays. bdscale offers an elegant solution to this. But the problem is it seems to break geom_rect, which is needed if I want to create a candlestick geom type. I'm not sure why this happens, maybe because geom_rect doesn't use x, instead xmin and xmax? In the following example p1 prints as expected, but with weekends as spaces. p2 which uses bdscale doesn't print, with a warning "Removed 73 rows containing missing values (geom_rect)"

`library(quantmod) library(ggplot2) library(bdscale)

getSymbols("AMZN") AMZN <- adjustOHLC(AMZN) colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data) BDDates <- bd2t(index(Data), DateRange)

p1 <- ggplot(Data) + geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) + scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) print(p1)

p2 <- ggplot(Data) + geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) + scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) + scale_x_bd(business.dates = index(Data)) print(p2) `

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/dvmlls/bdscale/issues/10

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

dvmlls commented 8 years ago

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr)
library(bdscale)
library(ggplot2)
library(quantmod)
library(magrittr)
library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>% 
  set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>%
  mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close), 
                     fill=open<close, group=date, middle=pmin(open,close))) + 
  geom_boxplot(stat='identity') +
  ggtitle("SPY: 2015") +
  xlab('') + ylab('') + theme(legend.position='none')

It displays this: image

If I then add the bdscale:

# all that previous stuff +
  scale_x_bd(business.dates=input$date, max.major.breaks=10)

Then it gives me this: image

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

# all that previous stuff +
  scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

image

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.

the-tourist- commented 8 years ago

I'm sitting outside enjoying the sun. So can't actually confirm this, I'm nowhere near a computer. But when I looked at the code of scale_x_bd last night I think it included setting a label format variable, which doesn't get used later in the code. But I am sitting outside in the sun, and could be wrong on this.

Sent from my iPad

On 19 Apr 2016, at 16:46, dvmlls notifications@github.com wrote:

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr) library(bdscale) library(ggplot2) library(quantmod) library(magrittr) library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>% set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>% mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close), fill=open<close, group=date, middle=pmin(open,close))) + geom_boxplot(stat='identity') + ggtitle("SPY: 2015") + xlab('') + ylab('') + theme(legend.position='none') It displays this:

If I then add the bdscale:

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10) Then it gives me this:

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

the-tourist- commented 8 years ago

month_format <- function(date) format(date, "%b '%y") quarter <- function(date) ceiling(as.integer(format(date, '%m')) / 3) quarter_format <- function(date) sprintf("Q%s '%s", quarter(date), format(date, '%y')) year_format <- function(date) format(date, '%Y')

Sent from my iPad

On 19 Apr 2016, at 16:46, dvmlls notifications@github.com wrote:

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr) library(bdscale) library(ggplot2) library(quantmod) library(magrittr) library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>% set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>% mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close), fill=open<close, group=date, middle=pmin(open,close))) + geom_boxplot(stat='identity') + ggtitle("SPY: 2015") + xlab('') + ylab('') + theme(legend.position='none') It displays this:

If I then add the bdscale:

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10) Then it gives me this:

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

dvmlls commented 8 years ago

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

bd_breaks <- function(business.dates, n.max=5) {

  breaks.weeks    <- firstInGroup(business.dates, last_monday)
  breaks.months   <- firstInGroup(business.dates, month_format)
  breaks.quarters <- firstInGroup(business.dates, quarter_format)
  breaks.years    <- firstInGroup(business.dates, year_format)
  breaks.years.5  <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5))
  breaks.decades  <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10))
...
the-tourist- commented 8 years ago

As a user of your package, the first thing I would do is format the dates. Rather than every user having to format the dates themselves, it would make sense to have them intelligently formatted like ggplot does. But reality suggests you have a real job, unlike myself, and could do it if you had the time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

format all the dates using the given format, e.g. Jan 1 2012 --> Q1 2012 Jan 2 2012 --> Q1 2012 ... group by the formatted value and find the first item, e.g. Q1 2012 --> Jan 1 2012 use that as the date for the break bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday) breaks.months <- firstInGroup(business.dates, month_format) breaks.quarters <- firstInGroup(business.dates, quarter_format) breaks.years <- firstInGroup(business.dates, year_format) breaks.years.5 <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5)) breaks.decades <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10)) ... — You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

dvmlls commented 8 years ago

It's open source, pull requests welcome!

I think fixing this would mean having a single "object" that controls both breaks and formatting. When it chooses breaks, store some state that later gets pulled out to format the labels. Maybe the state is just the format string itself.

Take a stab at it! On Apr 19, 2016 11:31 AM, "Graeme" notifications@github.com wrote:

As a user of your package, the first thing I would do is format the dates. Rather than every user having to format the dates themselves, it would make sense to have them intelligently formatted like ggplot does. But reality suggests you have a real job, unlike myself, and could do it if you had the time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

format all the dates using the given format, e.g. Jan 1 2012 --> Q1 2012 Jan 2 2012 --> Q1 2012 ... group by the formatted value and find the first item, e.g. Q1 2012 --> Jan 1 2012 use that as the date for the break bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday) breaks.months <- firstInGroup(business.dates, month_format) breaks.quarters <- firstInGroup(business.dates, quarter_format) breaks.years <- firstInGroup(business.dates, year_format) breaks.years.5 <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5)) breaks.decades <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10)) ... — You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/dvmlls/bdscale/issues/10#issuecomment-211979616

the-tourist- commented 8 years ago

If I can I'll try to look at it. I have quite a lot of things I'm working on simultaneously, but I'll try at some point to try to understand how it works, which I admit I didn't after quickly looking through it.

On Tue, Apr 19, 2016 at 6:35 PM, dvmlls notifications@github.com wrote:

It's open source, pull requests welcome!

I think fixing this would mean having a single "object" that controls both breaks and formatting. When it chooses breaks, store some state that later gets pulled out to format the labels. Maybe the state is just the format string itself.

Take a stab at it!

On Apr 19, 2016 11:31 AM, "Graeme" notifications@github.com wrote:

As a user of your package, the first thing I would do is format the dates. Rather than every user having to format the dates themselves, it would make sense to have them intelligently formatted like ggplot does. But reality suggests you have a real job, unlike myself, and could do it if you had the time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

format all the dates using the given format, e.g. Jan 1 2012 --> Q1 2012 Jan 2 2012 --> Q1 2012 ... group by the formatted value and find the first item, e.g. Q1 2012 --> Jan 1 2012 use that as the date for the break bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday) breaks.months <- firstInGroup(business.dates, month_format) breaks.quarters <- firstInGroup(business.dates, quarter_format) breaks.years <- firstInGroup(business.dates, year_format) breaks.years.5 <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5)) breaks.decades <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10)) ... — You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/dvmlls/bdscale/issues/10#issuecomment-211979616

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/dvmlls/bdscale/issues/10#issuecomment-212008540