Inconsistent header treatment for csv tables

AndydeCleyre commented 2 years ago

Hello!

I'm sorry I'm not sure exactly what's going on here, so I'll get to it. Using Zsh:

$ rows=( Package,Version,Latest,Project 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}

$ rows=( 'Package,Version,Latest,Project' 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}

Same result as above

$ rows=( 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}

What determines whether the first row gets treated as a header?

Thanks for any help!

willmcgugan commented 2 years ago

It’s a heuristic used by the Python CSV library, which is imperfect as you have noticed. In the future I’ll expose a way to adjust the via an option.

AndydeCleyre commented 2 years ago

Thanks! Do you know what about the input in this case gives CSV the wrong idea, so that I can work around this?

willmcgugan commented 2 years ago

Not sure. You could have a look at the source of the csv module.

AndydeCleyre commented 2 years ago

FYI:

csv.Sniffer.has_header:

  def has_header(self, sample):
      # Creates a dictionary of types of data in each column. If any
      # column is of a single type (say, integers), *except* for the first
      # row, then the first row is presumed to be labels. If the type
      # can't be determined, it is assumed to be a string in which case
      # the length of the string is the determining factor: if all of the
      # rows except for the first are the same length, it's a header.
      # Finally, a 'vote' is taken at the end for each column, adding or
      # subtracting from the likelihood of the first row being a header.

I spotted "lexter" @ https://github.com/Textualize/rich-cli/blob/main/src/rich_cli/__main__.py#L569

Textualize / rich-cli

Inconsistent header treatment for csv tables #29