Open MichaelChirico opened 8 years ago
The less points is defined in scope the more easy is to merge a PR for it. Definitely it make sense to separate points which may result in breaking change (if any) from those for which default behaviour will not change.
this won't be done in a single PR though, but rather one by one
On Mon, Feb 4, 2019, 12:23 PM Jan Gorecki <notifications@github.com wrote:
The less points is defined in scope the more easy is to merge a PR for it. Definitely it make sense to separate points which may result in breaking change (if any) from those for which default behaviour will not change.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/1523#issuecomment-460127326, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdeNB5EZPMn44zsIfag--2jsQwZTyks5vJ7WmgaJpZM4HUz9_ .
As an extension to @fparages' https://github.com/Rdatatable/data.table/pull/3500 (addressing the timezone display item in the OP of this issue/thread), it might be nice to also support the tz being printed in the class header, <POSc:-07:00>
or <POSc:PDT>
, and not in the column (to save horizontal space), eg when class=tz=TRUE.
^ related: #2842
That would be awesome!
hi all I don't know if you care but I noticed a bug in print.data.table(col.names="none")
when there are lots of columns. minimal code is:
library(data.table)
x <- 1:30
DT <- data.table(t(x))
print(DT, col.names="none")
output on my system is:
th798@cmp2986 MINGW64 ~/R
$ R --vanilla < datatable-print-bug.R
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(data.table)
> x <- 1:30
> DT <- data.table(t(x))
> print(DT, col.names="none")
1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
V22 V23 V24 V25 V26 V27 V28 V29 V30
1: 22 23 24 25 26 27 28 29 30
>
]0;MINGW64:/c/Users/th798/R
th798@cmp2986 MINGW64 ~/R
$
You can see in the output above that the column names V22 through V30 are printed, but I expected they should not be. What I expected:
> print(DT, col.names="none")
1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1: 22 23 24 25 26 27 28 29 30
>
Is there any scope to add a dplyr::glimpse
equivalent (I don't see it in the list)? While I most certainly can use dplyr
for this purpose, I will need to install a bunch of dependencies to get the function. A use case is highlighted below (note: downloads ~4MB to system).
download.file(
"https://download.cms.gov/nppes/NPPES_Data_Dissemination_120720_121320_Weekly.zip",
"file.zip")
data <- fread("unzip -cq file.zip npidata_pfile_20201207-20201213.csv")
data.table
's default print
method gives a very long output that is not really helpful in understanding the contents of the file.
```R
> data
NPI Entity Type Code Replacement NPI Employer Identification Number (EIN)
1: 1134691124 2 NA
dplyr::glimpse
also gives a long output, but still very much readable.
```R
> dplyr::glimpse(data)
Rows: 21,806
Columns: 330
$ NPI
Current task list:
.Rd
file forprint.data.table
3. Ability to turn off smart table wrapping [2) from #645/R-F#1957 - Yike Lu]by
-groupings [4) from #645/R-F#1957 - Yike Lu]7. Demarcation of key columns [part of 5) from #645/R-F#1957 - Yike Lu]dplyr
-like printing [see below - @MichaelChirico]dplyr
tbl_df
[#1497 - @nverno; #2608 - @vlulla]data.table
[#545/R-F#5253 - @arunsrinivasan]list
/non-atomic columns [see below - @franknarf1 via SO; also #605; handled in #2562]POSIXct
columns with timezones should include that information in printed output [#2842 - @MichaelChirico]print.data.table
would exceedmax.print
)Some Notes
3 (tabled pending clarification)
As I understand it, this issue is a request to prevent the console output from wrapping around (i.e., to force all columns to appear parallel, regardless of how wide the table is).
If that's the case, this is (AFAICT) impossible, since that's something done by RStudio/R itself. I for one certainly don't know of any way to alter this behavior.
If someone does know of a way to affect this, or if they think I'm mis-interpreting, please pipe up and we can have this taken care of.
7
As I see it there are two options here. One is to treat all key columns the same; the other is to treat secondary, tertiary, etc. keys separately.
Example output:
And of course, add an option for deciding whether to demarcate with
|
or some other user's-choice character (*
,+
, etc.)9 [DONE]
Some feedback from a closed PR that was a first stab at solving this:
From Arun regarding preferred options:
It would be nice to have an option to print a row under the row of column names which gives each column's stored type, as is currently (I understand) the default for the output of
dplyr
operations.Example from
dplyr
:Current best alternative is to do
sapply(DF, class)
, but it's nice to have a preview of the data wit this extra information.11
This seems closely related to 3. Current plan is to implement this as an alternative to 3 since it seems more tangible/doable.
Via @nverno:
and the guiding example from Arun:
12
Currently covered by @jangorecki's PR #1448; Jan, assuming #1529 is merged first, could you edit the
print.data.table
man page for your PR?