Open MichaelChirico opened 8 years ago
Just brilliant!
No idea about 3 and 5 (as to what they mean).
I think a PR for 6 would be nice (seems straightforward from what Jan wrote there). Perhaps ?print.data.table
is the time consuming part? Do you think you'd be up for this, @MichaelChirico ?
No idea as to what 7 means either..
8 is another great idea. PR would be great!
It'd be really nice if Github would allow assigning tasks to project who aren't necessarily members :-(.
There's also https://github.com/Rdatatable/data.table/issues/1497
@arunsrinivasan should I try and PR this one issue at a time? Or in a fell swoop? I've got 8 basically taken care of, just need to add tests.
Michael, separate PRs.
Very nice! Sorry to get back to you late on this, but Arun provided a nice example. It is just a nice convenience when interactively looking at tables with lots columns so your console isn't engulfed by a huge data dump when you take a look at the head. Ill close that other one.
It'd be also nice to print:
primary key:
Also, I think this is better output for:
print(DT, class=TRUE)
<char> <int> <num>
site date x
1: A 1 10
2: A 2 20
3: A 3 30
4: B 1 10
5: B 2 20
6: B 3 30
It's easier to copy/paste the data.table without the classes in the way. If we can do that, we can turn on printing classes by default.
Thoughts?
@arunsrinivasan about printing keys:
tables()
? (though TBH I almost never use this function) BTW tables
, to the extent that it's useful, could go for an update to add a secondary_indices
column...About class
:
This can be done, but will require a step of wrangling -- basically toprint <- rbind(rownames(toprint), toprint); rownames(toprint) <- abbs
. Which is fine, I'm just curious why you're thinking of easier copy-pasting as a clear advantage? Not sure the cost of including class
info in copy-pasted output. Happy to hear feedback.
About class
: -- copy pasting from SO, for example to provide input to fread()
. I also find it easier without the separation between column name and value (just used to seeing it).
On printing keys:
primary key: <a, b>
clearly tells the first key column is "a", then "b"..
Does this clarify things a bit?
I agree tables()
could use an update.
@arunsrinivasan OK, I think I can get on board with that. Can ditch point # 7 then. I agree distinguishing key order at a glance was going to be tough. So how about:
If a table has a key, say c("key1", "key2")
, print the following above the output of print.data.table
:
keys: key1, key2
If there is no key, print:
keys: <unkeyed>
Secondary index printing is optional, but if activated will come below keys
a la:
Secondary indices: key2.1, key2.2, ... key3.1, key3.2, ...
Lastly, I propose sending this output through message
to help distinguish it from the data.table
itself visually.
My suggestion would be this:
Keys: <col1, col2> (only one)
Secondary Indices: key2()
.
I don't mind "<>" being replaced with "" if that'd be more aesthetically pleasing.. e.g., "col1,col2", "col1" etc..
Last proposal: seems nice, but I wonder if it might create issues wth knitr when people suppress 'messages' in chunk.. and print the output?
It'd be great to have this and class=TRUE default for v1.9.8 already.. we'll see.
One other thought:
Many people use "numeric" type when an integer type would suffice, and when "integer64" would fit the bill better. How about marking those columns somehow while printing?
instead of
OR "!num!"? There's a function isReallyReal
that checks this. But this'll perhaps be too time consuming to run on all rows every time..
@arunsrinivasan Hmm I think it's definitely not something to be used as a part of print.data.table
default.
Some initial musings:
check_num_cols
or the like) which runs this on an input table and spits out the candidate columns.data.table
in memory which we use to trigger the evaluationverbose
) output of fread
(since I imagine that's where most data.table
s are created in general. I guess setDT
is the other big source.Are you thinking of pushing 1.9.8 soon?
Oh, one more thing, what do you think about porting print.data.table
to its own .R file?
Hm, yes, let's forget the marking of columns for now.
On pushing 1.9.8: trying as much as possible to wrap the other issues marked as quick as possible. I'd like to work on non-equi joins for this release.
On print.data.table to separate file, sure, sounds good.
@arunsrinivasan just a heads up that setting class = TRUE
as the default is causing 100s of errors in the tests
Okay thanks, will take a look.
@arunsrinivasan nvm, on second glance, it's a lot, but manageable. Have to fix ~ 25 tests. Working now...
Great! No hurry. Take your time.
I'm not really convinced about changing default on printing class. I'm not finding it useful in print
, I use str
to see classes (in dplyr for some reason they have glimpse function for that purpose).
Isn't that better for print
to by default just print the data, and use str
to print classes and key/indexes?
I agree with @jangorecki that class=FALSE default is preferable. I value my screen real estate and usually don't need reminders about columns' classes. Ditto for keys and indices. I like these features, but would expect them to be off by default.
Thanks for your input. I do think it's useful. Unless there's a strong reason (+ vote) against this, I'd like to give it a go. Maybe a lot others might prefer it.
Perhaps we can put the keys / indices on hold. But I don't think 1 row for class types is taking away your screen's real estate.
@MichaelChirico can we make the 'keys' argument FALSE for this release? Perhaps we can turn it on in the next one seeing how this one goes.
@arunsrinivasan sure. Will handle this after we iron out the update to class
.
I agree with Frank that having it by default may be somewhat information overload... perhaps there's a middle ground (only print class if there's been a change in class for some column, e.g.).
Anyway happy to give setting class = TRUE
as default a whirl.
Do we have any script that can be run to check packages that depends on data.table? Asking because potentially any package that tests output with Rout - Rout.save (or capture.output
- I have 2 such non-CRAN pkgs) could be broken after changing default print. It is valuable to run such tests before and after to see the impact precisely. Then depending on the percentage of affected CRAN package would be best to decide.
@jangorecki, good point. class=FALSE
then for now. I'll come back to these issues later. Not important for now.
Any plans for minimalistic version of print key with *
star prefix? or other nice ascii symbol? something like:
setkey(DT, site, date)
options("datatable.key.note"=TRUE)
print(DT)
# *site *date x
#1: A 1 10
#2: A 2 20
It would be my preferred one.
@jangorecki I'm fine with any way, but the resistance that cropped up with an approach like that is some people preferred to see key order as well, e.g.:
# *site **date x
In any case, if implemented, I would: set *
as the default, and leave an option for making it whatever you want.
@MichaelChirico On one hand multiple starts are OK but if you would have on 20 columns in key? Maybe single star only if the order of key columns is the same as data columns, for me that would be in ~99% cases.
up to 3 elements there are ascii numbers:
# ¹*site ²*date x
@MichaelChirico about 3) above, one can use R global options:
width.user <- options("width")
options(width=as.integer(howWideIsDT)) # temporarily resize the output console
print(DT)
options(width=width.user) # reset to user's preferences
@mbacou thanks for the input!
In RStudio, at least, I don't see a difference in output having done that.
@MichaelChirico You should see a difference. Try
library(data.table)
options(width=500)
(DT = data.table(matrix(1:1e3,1)))
RStudio wraps console output and offers no option to disable this "feature"; while base R console overflows with no wrapping until options()$width. Either way you should see a difference. Try resizing your console window to see the wrapping in action.
Might be useful to add an optional format
argument similar to knitr::kable()
or type
in ascii::print()
to generate markdown, pandoc, rst, textile, (etc.) and org-mode compatible table formats?
I often use snippets like these to paste results into e-mails and org or markdown documents:
print(ascii(x, digits=2), type="org")
# | | ISO3 | ADM0_NAME | ELEVATION | whea_h |
# |---+------+-----------------------------+---------------+----------|
# | 1 | TZA | United Republic of Tanzania | | 19.00 |
# | 2 | TZA | United Republic of Tanzania | (3e+02,5e+02] | 0.00 |
# | 3 | TZA | United Republic of Tanzania | (5e+02,9e+02] | 743.00 |
# | 4 | TZA | United Republic of Tanzania | (9e+02,1e+03] | 9519.00 |
# | 5 | TZA | United Republic of Tanzania | (1e+03,2e+03] | 29814.00 |
# | 6 | TZA | United Republic of Tanzania | (2e+03,5e+03] | 894.00 |
knitr::kable(x, format="markdown")
# |ISO3 |ADM0_NAME |ELEVATION | whea_h|
# |:----|:---------------------------|:-------------|------:|
# |TZA |United Republic of Tanzania |NA | 19|
# |TZA |United Republic of Tanzania |(3e+02,5e+02] | 0|
# |TZA |United Republic of Tanzania |(5e+02,9e+02] | 743|
# |TZA |United Republic of Tanzania |(9e+02,1e+03] | 9519|
# |TZA |United Republic of Tanzania |(1e+03,2e+03] | 29814|
# |TZA |United Republic of Tanzania |(2e+03,5e+03] | 894|
@mbacou not quite convinced of the utility of adding this to print.data.table
when ascii::print
and knitr::kable
already seem to do a fine job...
Agreed. I'd vote for minimal output as well, but if you plan to provide more fancy printing options, then using a table format that pandoc can process would make sense.
A minor thing, but it might be a good idea to export print.data.table. I only noticed it was hidden when typing args(print.data.table)
just now.
@franknarf1 any other reason? we have ?print.data.table
now and args(data.table:::print.data.table)
have that covered. was just about to file the export in a PR, but stopped myself. i don't think it's uncommon for print
methods to be hidden (see print.lm
/print.glm
in base, e.g.)
@MichaelChirico Nope. Not a problem unexported as you say; thanks for asking.
Another idea: an option dput = TRUE
, that will write reproducible code (since dput(DT)
doesn't work). Something like
dtput = function(DT){
d0 = capture.output(dput(setattr(data.table:::shallow(DT), ".internal.selfref", NULL)))
cat("data.table::alloc.col(", d0, ")\n", sep="\n")
}
# example
library(data.table)
DT = as.data.table(as.list(1:10))
dtput(DT)
# which writes...
data.table::alloc.col(
structure(list(V1 = 1L, V2 = 2L, V3 = 3L, V4 = 4L, V5 = 5L, V6 = 6L,
V7 = 7L, V8 = 8L, V9 = 9L, V10 = 10L), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA,
-1L), class = c("data.table", "data.frame"))
)
... except less hacky and embedded in print.data.table
. I guess if dput = TRUE
, all the others can be ignored. Getting fancy, maybe allow dput = "file.txt"
like dput()
does. I figure it makes enough sense to put it in print
, and it's not worth it to add a new function.
Another idea similar to those in #645 : turn off smart truncation of list column display: example from SO.
I see this truncation pretty frequently, and in some cases it'd be nice to see printing as if list column v was sapply(v, toString)
instead.
@franknarf1 i think a very easy fix would be here:
paste(c(format(head(x,6), justify=justify, ...), if(length(x)>6)""),collapse=",")
change ""
to "..."
. What do you think? I like toString
, but should also come with a default width
parameter, I'm not sure how to do that robustly.
actually, re-reading toString.default
:
function (x, width = NULL, ...)
{
string <- paste(x, collapse = ", ")
if (missing(width) || is.null(width) || width == 0)
return(string)
if (width < 0)
stop("'width' must be positive")
if (nchar(string, type = "w") > width) {
width <- max(6, width)
string <- paste0(strtrim(string, width - 4), "....")
}
string
}
It seems the default way of handling width
is similar to what's currently implemented. I think limiting output based on on-screen width rather than truncating to the first few elements is better, no?
This approach also allows better user interaction since toString
is S3
-registered -- we (or end users) could write/customize toString.*
methods for any use cases that arise. Perhaps add a colWidth
parameter to print.data.table
which would be dropped into width
of toString.default
...
@MichaelChirico One point in favor of the trailing ","
over a ",..."
is that it saves horizontal space. Nonetheless, that seems like a good change, since most users won't guess what the ","
means.
Rather than that change, I was more interested in was printing a higher number of entries in place of 6
in head(x, 6)
, like your colWidth idea.
Re methods, I'd find an argument like formatters = list(character = function(x) toString(x), lm = function(x) x$qr$tol)
easy to use (to be used for list columns provided every element matches the named class or is NULL). Not sure if that's what you meant.
Thought I would drop a mention of #2893 here as the two seem closely related.
(Similar to my last comment...) Having a data.table like...
library(data.table)
(DT <- data.table(id = 1:2, v = numeric_version("0.0.0")))
# id v
# 1: 1 <numeric_version>
# 2: 2 <numeric_version>
I cannot really read the contents of my list column, even though there is a print method for it.
It would be nice to have a way to tell data.table how I want a list column of a certain class printed, like ...
library(magrittr)
formatters = list(numeric_version = as.character)
printDT = data.table:::shallow(DT)
left_cols = which(sapply(DT, is.list))
for (i in seq_along(formatters)){
if (length(left_cols) == 0L) break
alt_cols = left_cols[ sapply(DT[, ..left_cols], inherits, names(formatters)[i]) ]
if (length(alt_cols)){
printDT[, (alt_cols) := lapply(.SD, formatters[[i]]), .SDcols = alt_cols][]
left_cols = setdiff(left_cols, alt_cols)
}
}
print(printDT)
id v
1: 1 0.0.0
2: 2 0.0.0
Could have that list passed by the user in options(datatable.print.formatters = formatters)
. To reduce the computational burden, I guess this would be done after filtering with nrows=
and topn=
.
(If I want to suggest an addition to this list, do I add it here or add it as a discrete issue?)
you can just add it here. feel free to edit initial post but also include a comment w some exposition please
On Mon, Feb 4, 2019, 10:19 AM HughParsonage <notifications@github.com wrote:
(If I want to suggest an addition to this list, do I add it here or add it as a discrete issue?)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/1523#issuecomment-460113509, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdd5pO_1tQjE7BL_B2i2dGeRN4p5yks5vJ5jNgaJpZM4HUz9_ .
Current task list:
.Rd
file forprint.data.table
3. Ability to turn off smart table wrapping [2) from #645/R-F#1957 - Yike Lu]by
-groupings [4) from #645/R-F#1957 - Yike Lu]7. Demarcation of key columns [part of 5) from #645/R-F#1957 - Yike Lu]dplyr
-like printing [see below - @MichaelChirico]dplyr
tbl_df
[#1497 - @nverno; #2608 - @vlulla]data.table
[#545/R-F#5253 - @arunsrinivasan]list
/non-atomic columns [see below - @franknarf1 via SO; also #605; handled in #2562]POSIXct
columns with timezones should include that information in printed output [#2842 - @MichaelChirico]print.data.table
would exceedmax.print
)Some Notes
3 (tabled pending clarification)
As I understand it, this issue is a request to prevent the console output from wrapping around (i.e., to force all columns to appear parallel, regardless of how wide the table is).
If that's the case, this is (AFAICT) impossible, since that's something done by RStudio/R itself. I for one certainly don't know of any way to alter this behavior.
If someone does know of a way to affect this, or if they think I'm mis-interpreting, please pipe up and we can have this taken care of.
7
As I see it there are two options here. One is to treat all key columns the same; the other is to treat secondary, tertiary, etc. keys separately.
Example output:
And of course, add an option for deciding whether to demarcate with
|
or some other user's-choice character (*
,+
, etc.)9 [DONE]
Some feedback from a closed PR that was a first stab at solving this:
From Arun regarding preferred options:
It would be nice to have an option to print a row under the row of column names which gives each column's stored type, as is currently (I understand) the default for the output of
dplyr
operations.Example from
dplyr
:Current best alternative is to do
sapply(DF, class)
, but it's nice to have a preview of the data wit this extra information.11
This seems closely related to 3. Current plan is to implement this as an alternative to 3 since it seems more tangible/doable.
Via @nverno:
and the guiding example from Arun:
12
Currently covered by @jangorecki's PR #1448; Jan, assuming #1529 is merged first, could you edit the
print.data.table
man page for your PR?