dfurtado / dataclass-csv

Map CSV to Data Classes
Other
195 stars 21 forks source link

Allow NULL values more conveniently, and fix README TypeError: non-default argument 'age' follows default argument #47

Closed nealmcb closed 2 years ago

nealmcb commented 2 years ago

CSV files and other databases commonly have fields that are NULL (missing / empty). As an easy example, a string field in a row might be of zero length. Handling such situations is currently complicated and confusing with dataclass-csv.

The current code allows a workaround by specifying a default value.

The example given for default values is this:

@dataclass
class User:
    firstname: str
    email: str = 'Not specified'
    age: int

But that generates this error:

Traceback (most recent call last):
  File "example.py", line 5, in <module>
    class User:
  File "/usr/lib/python3.8/dataclasses.py", line 1019, in dataclass
    return wrap(cls)
  File "/usr/lib/python3.8/dataclasses.py", line 1011, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
  File "/usr/lib/python3.8/dataclasses.py", line 925, in _process_class
    _init_fn(flds,
  File "/usr/lib/python3.8/dataclasses.py", line 502, in _init_fn
    raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'age' follows default argument

It actually works to simply put all the fields which have default values at the end of the class, since the order of class members doesn't matter:

@dataclass
class User:
    firstname: str
    age: int
    email: str = ''

A quick, partial solution would seem to be fixing the example, and noting the need to put the fields with default values at the end.

But requiring that the fields not appear in order reduces the clarify of the code and is more convoluted than necessary.

I think it would be good to also automatically move the fields needing default values to the end of the generated class.

It is perhaps also worthwhile to provide a way to specify that NULL values are ok for all fields, though that requires figuring out what value to use as a default default, I guess....

nealmcb commented 2 years ago

One of the biggest hassles here is that the default error handling makes it a hassle to deal gracefully with bad input lines. You get exceptions like

dataclass_csv.exceptions.CsvValueError: The field `contest_name` is required. [CSV Line number: 2]

Which break the for loop, and make it hard to continue processing the rest of the file. Wrapper functions to allow continuation are possible, but not straightforward or elegant, as explained here:

How to catch an exception in the for loop iterator - Stack Overflow

dfurtado commented 2 years ago

Hi @nealmcb thanks for opening this issue.

Regarding putting the fields with default values at the end is actually not a requirement of this lib but the dataclasses in the standard library. I can check if there is anything that I can do.

The second error that you posted, maybe I misunderstood but isn't that because a value for the content_name is missing in the CSV file? In that case, would help to have a default empty string value? Could you provide me with mode details? A sample CSV with a few lines and the definition of your dataclass would be helpful to troubleshoot.