golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.38k stars 17.59k forks source link

encoding/csv: need more functionalities in package #33237

Open prithvipal opened 5 years ago

prithvipal commented 5 years ago

The csv package has very basic functionalities like Read(): reads single record, ReadAll() read all records etc. Since csv always has list of column names. All records are always corresponding to column names.

Proposal

There should be some functionalities related to Columns(not expecting big as Pandas) so it will be easier to process csv files using built-in package itself.

  1. Get column by column name so the caller can understand which are the column available in cvs
  2. Delete Columns
  3. Filter by Column
  4. List all columns
  5. Add new column etc.
sachi-gkp commented 5 years ago

I agree on this proposal. Currently, enconding/csv package provides very basic features which is not useful to write production grade code. And developer needs to write too much boilerplate codes in order to use this package..

nussjustin commented 5 years ago

The encoding/csv package already provides support for reading and writing of CSV. Everything else can be easily build on top of this, so I don't see why any of this should be in the stdlib.

What speaks against providing this functionality in a third party package?

I've worked with many different CSV files in the last few years, but never needed any of the features you listed except the first one (which can be easily solved with a simple map[string]string and can be abstracted away in just a few lines of code).

Also you say that CSV always has column names. This is wrong. Though relatively rare, there are still many CSV files without column names (often these names are just not in the CSV file but are always the same, so they will be hardcoded in the application).

Now even if there are good reasons for having this in the stdlib the new functionality could still be implemented outside the stdlib first and then later adopted into the stdlib (or one of the golang.org/x repositories). That way it's much easier to iterate on the API and collecting feedback from users (once adopted into the stdlib, we can't break backwards compatibility)

Also please try to give examples (e.g. function signatures, types, ...) instead of just listing ideas for features.

julieqiu commented 5 years ago

/cc @dsnet

dsnet commented 5 years ago

The encoding/csv package fundamentally operates on a CSV file in a streaming manner. For this reason it takes in an io.Reader and an io.Writer. However, the functionality that this issue proposes requires: 1) that the entire CSV file to be loaded into memory (this is in stark contrast to how the encoding/csv package currently operates). 2) that there is a new type that represents an in-memory representation of the CSV file (e.g., a named [][]interface{} type) that you can perform the filter operations on.

While I can see such filter operations being useful, I don't see why 1) it needs to be in the standard library. If so, encoding/csv is the wrong place for it since it actually has nothing to do with encoding since it's about filtering and querying the data after serialization. 2) why it needs to be about CSV. You can imagine such filter/query functionality be easily expand to operate on JSON-like data structures or really any generic Go data structure.

This functionality seems like it should be explored outside the standard library.

prithvipal commented 5 years ago

encoding/csv is the wrong place for it since it actually has nothing to do with encoding

We can introduce this functionality in x package.

You can imagine such filter/query functionality be easily expand to operate on JSON-like data structures

I agree with this. This functionality can be implemented on any data structure like JSON.