cldf / csvw

CSV on the web
Apache License 2.0
37 stars 6 forks source link

Add comment writing capabilities #12

Closed Anaphory closed 6 years ago

Anaphory commented 6 years ago

I just had a use case for writing comments to a CSV file, in order to ensure that light metadata stays with the data. Given that the readers in this module can apparently read CSV with comments, it seems appropriate for the writer to have a method like

def writecomment(self, comment):
    if self.comment_prefix is None:
        raise ValueError(
            'Cannot write comments in this csv dialect')
    for row in comment.split("\n"):
        self.f.write(self.comment_prefix)
        self.f.write(row)
        self.f.write('\n')

which probably still needs consideration of encoding and escaping. (And self.comment_prefix would need to be set in the constructor.)

LinguList commented 6 years ago

woudl be useful for handling like edictor setting comments, but also other cases, of course. I remember we put this at the side, but I do not remember, in which issue we discussed it (maybe somewhere in pycldf).

Anaphory commented 6 years ago

I have started a rudimentary implementation: https://github.com/Anaphory/csvw/commit/2c659749196bcad507daeca04f7e29123ddf8800

xrotwang commented 6 years ago

Hm. I almost feel bad about having implemented more support for comments than just skipping them on read :)

Why should the metadata stay with the data - but in an ideosyncratic/under-specified way? Why not add a column? Or put the metadata into the JSON file?

I sort of see the edictor use case - but then, for global (i.e. not per-row) metadata stored as comments in the csv, it would be easy enough to read the comments in separate code, and have csvw simply strip the comments.

LinguList commented 6 years ago

Yes, that's what edictor is doing now, and in fact, when sharing data after having comments inside it, it is probably anyway better to strip those off, at least for publication...

Anaphory commented 6 years ago

In this case, I have a lot of simulation results in different files and I have once to many accidentally moved data files without their metadata, losing the metadata. That's why I wanted a quick-and-dirty (admittedly) way to keep metadata in the data file.

Honestly, I would probably even prefer to use dedicated code to write to the data file because in general I don't like comments in CSVs, so I should not really promote their use. However, I thought that code should ideally be aware of the UnicodeDictWriter's dialect (which my commit is barely), and the logical consequence of that was to suggest it for inclusion here.

xrotwang commented 6 years ago

@Anaphory I agree that the amount of support for reading comments would motivate exactly the kind of support for writing you have in mind. So I'm somewhat sympathetic. It's just that I think it was a mistake to begin with :)

Anaphory commented 6 years ago

Good! Let's drop this.