JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.35k stars 5.46k forks source link

More list printing bugs in 1.6 #40508

Open pdeffebach opened 3 years ago

pdeffebach commented 3 years ago

Looking at the CSV Documentation in 1.6, I get this big block instead of a nice list


    •  File layout options: • header=1: the header argument can be an Int, indicating the row to parse for column names; or a Range, indicating a
       span of rows to be concatenated together as column names; or an entire Vector{Symbol} or Vector{String} to use as column names; if a file
       doesn't have column names, either provide them as a Vector, or set header=0 or header=false and column names will be auto-generated (Column1,
       Column2, etc.). Note that if a row number header and comment or ignoreemtpylines are provided, the header row will be the first
       non-commented/non-empty row after the row number, meaning if the provided row number is a commented row, the header row will actually be the
       next non-commented row. • normalizenames=false: whether column names should be "normalized" into valid Julia identifier symbols; useful when
       iterating rows and accessing column values of a row via getproperty (e.g. row.col1) • datarow: an Int argument to specify the row where the
       data starts in the csv file; by default, the next row after the header row is used. If header=0, then the 1st row is assumed to be the start
       of data; providing a datarow or skipto argument does not affect the header argument. Note that if a row number datarow and comment or
       ignoreemtpylines are provided, the data row will be the first non-commented/non-empty row after the row number, meaning if the provided row
       number is a commented row, the data row will actually be the next non-commented row. • skipto::Int: identical to datarow, specifies the
       number of rows to skip before starting to read data • footerskip::Int: number of rows at the end of a file to skip parsing. Do note that
       commented rows (see the comment keyword argument) do not count towards the row number provided for footerskip, they are completely ignored by
       the parser • limit: an Int to indicate a limited number of rows to parse in a csv file; use in combination with skipto to read a specific,
       contiguous chunk within a file; note for large files when multiple threads are used for parsing, the limit argument may not result in exact
       an exact # of rows parsed; use threaded=false to ensure an exact limit if necessary • transpose::Bool: read a csv file "transposed", i.e.
       each column is parsed as a row • comment: rows that begin with this String will be skipped while parsing. Note that if a row number header or
       datarow and comment are provided, the header/data row will be the first non-commented/non-empty row after the row number, meaning if the
       provided row number is a commented row, the header/data row will actually be the next non-commented row. • ignoreemptylines::Bool=true:
       whether empty rows/lines in a file should be ignored (if false, each column will be assigned missing for that empty row) • threaded::Bool:
       whether parsing should utilize multiple threads; by default threads are used on large enough files, but isn't allowed when transpose=true;
       only available in Julia 1.3+ • tasks::Integer=Threads.nthreads(): for multithreaded parsing, this controls the number of tasks spawned to
       read a file in chunks concurrently; defaults to the # of threads Julia was started with (i.e. JULIA_NUM_THREADS environment variable) •
       lines_to_check::Integer=5: for multithreaded parsing, a file is split up into tasks # of equal chunks, then lines_to_check # of lines are
       checked to ensure parsing correctly found valid rows; for certain files with very large quoted text fields, lines_to_check may need to be
       higher (10, 30, etc.) to ensure parsing correctly finds these rows • select: an AbstractVector of Int, Symbol, String, or Bool, or a
       "selector" function of the form (i, name) -> keep::Bool; only columns in the collection or for which the selector function returns true will
       be parsed and accessible in the resulting CSV.File. Invalid values in select are ignored. • drop: inverse of select; an AbstractVector of
       Int, Symbol, String, or Bool, or a "drop" function of the form (i, name) -> drop::Bool; columns in the collection or for which the drop
       function returns true will ignored in the resulting CSV.File. Invalid values in drop are ignored.

Will have to check if this problem on master.

KristofferC commented 3 years ago

Hopefully fixed by https://github.com/JuliaLang/julia/pull/40203. Should be in 1.6.1.

mgkuhn commented 3 years ago

Using Julia 1.6.1 on Ubuntu Linux 20.04 with CSV v0.8.4, I get in an 80-characters wide terminal:

julia> using CSV
help?> CSV.File
  CSV.File(source; kwargs...) => CSV.File
[...]
    •  File layout options:
       • header=1: the header argument can be an Int, indicating
       the row to parse for column names; or a Range,
       indicating a span of rows to be concatenated together as
       column names; or an entire Vector{Symbol} or
       Vector{String} to use as column names; if a file doesn't
       have column names, either provide them as a Vector, or
       set header=0 or header=false and column names will be
       auto-generated (Column1, Column2, etc.). Note that if a
       row number header and comment or ignoreemtpylines are
       provided, the header row will be the first
       non-commented/non-empty row after the row number,
       meaning if the provided row number is a commented row,
       the header row will actually be the next non-commented
       row.
       • normalizenames=false: whether column names should be
       "normalized" into valid Julia identifier symbols; useful
       when iterating rows and accessing column values of a row
       via getproperty (e.g. row.col1)
[...]

Better than in the original report, but still not correct: text for second-level items wraps to indentation level of first-level items. I would have expected the output to look like

    •  File layout options:
       • header=1: the header argument can be an Int, indicating
         the row to parse for column names; or a Range,
         indicating a span of rows to be concatenated together as

Can also be seen at other places that use nested lists, e.g.

julia> using Distributed
help?> addprocs
[...]
    •  shell: specifies the type of shell to which ssh connects on the
       workers.
       • shell=:posix: a POSIX-compatible Unix/Linux shell (bash,
       sh, etc.). The default.
       • shell=:wincmd: Microsoft Windows cmd.exe.
[...]
pdeffebach commented 3 years ago

I also see the above. Lines >2 should be indented by two spaces.

But this is still an improvement and is readable.

vtjnash commented 2 years ago

Can be improved more, but doesn't seem to be a bug

KristofferC commented 2 years ago

I am pretty sure this is both a bug and a regression since this worked fine in 1.5. It started going bad in https://github.com/JuliaLang/julia/pull/37087 and many attempts was made to fix it, https://github.com/JuliaLang/julia/pull/37235, https://github.com/JuliaLang/julia/pull/38502, https://github.com/JuliaLang/julia/pull/40203.