frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
724 stars 148 forks source link

Make properties accessible eg. for CsvDialect, Detector #1014

Closed mcarans closed 2 years ago

mcarans commented 2 years ago

Overview

I am upgrading from Tabulator to Frictionless. I have written a Frictionless wrapper to try to keep my library's API broadly as it was before.

Properties are not always accessible in Frictionless classes eg. for CsvDialect I must do:

...
dialect = CsvDialect()
delimiter = kwargs.pop("delimiter", None)
if delimiter is not None:
    setattr(dialect, "delimiter", delimiter)
...

For Detector, I have to do:

...
detector = kwargs.get("detector", frictionless.Detector())
if infer_types:
    default = None
else:
    default = "string"
field_type = kwargs.pop("field_type", default)
detector._Detector__field_type = field_type
field_float_numbers = kwargs.pop("field_float_numbers", True)
detector._Detector__field_float_numbers = field_float_numbers
...

Would it be possible to make all these properties settable after creating an instance of the class?

roll commented 2 years ago

Hi @mcarans,

Metadata classes in Frictionless@4 are Python dicts so you can update them as dicts - https://framework.frictionlessdata.io/docs/guides/describing-data#transforming-metadata

So, for example, for Dialect (if I understood your intentions correctly):

dialect = CsvDialect()
dialect.delimiter = ';'
dialect['delimiter'] = ';' # both works

The Detector class is not Metadata Dict so I'm going to create an issue for adding setters for it

mcarans commented 2 years ago

Thanks for you reply @roll.

If I do dialect.line_terminator = line_terminator, PyCharm shows the error Property 'line_terminator' cannot be set. Here is the same for delimiter: Screenshot from 2022-04-04 09-08-20

If I do dialect["line_terminator"] = line_terminator, I get the error frictionless.exception.FrictionlessException: [dialect-error] Dialect is not valid: "Additional properties are not allowed ('line_terminator' was unexpected)" at "" in metadata and at "additionalProperties" in profile

BTW, I'm not sure setting line terminator is working correctly. I would expect the silly line terminator below to result in rows having one long line, but it doesn't:

dialect = CsvDialect(line_terminator="123")
resource = frictionless.Resource("https://raw.githubusercontent.com/OCHA-DAP/hdx-python-utilities/main/tests/fixtures/test_data.csv", dialect=dialect)
resource.open()
rows = resource.read_rows()
resource.close()
roll commented 2 years ago

Hi @mcarans,

I've created an issue to support setter in Detector (it's not a metadata class with autosetters atm) - https://github.com/frictionlessdata/frictionless-py/issues/1015

roll commented 2 years ago

Unfortunatelly, lineterminator is not supported on Python level - https://docs.python.org/3/library/csv.html#csv.Dialect.lineterminator

mcarans commented 2 years ago

@roll I also get "property cannot be set" for delimiter (see previous screenshot). It seems like the Frictionless code must be indicating that properties on CsvDialect (or ExcelDialect) should not be set. Maybe the intention was that properties only be passed in the constructor, but setting properties of frictionless.Layout does not produce the warning eg. layout.skip_rows = ["<blank>"] is fine.

roll commented 2 years ago

I think it's because of some complex system we use to create setters on Metadata classes - https://github.com/frictionlessdata/frictionless-py/blob/main/frictionless/plugins/csv.py#L86

This @Metadata.property decorator defines a setter alongside with caching optimization etc but it looks like analysis tools (vscode) can't figure it out. You can use dict API instead to overcome this problem: dialect['delimiter'] = ...

We consider simplifying Metadata classes in v5

roll commented 2 years ago

@mcarans I just noticed that in your example we also need to use camelCase when we work with Metadata as a dict - https://framework.frictionlessdata.io/docs/guides/quick-start#usage (as in the standards - https://specs.frictionlessdata.io/table-schema/):

dialect["line_terminator"] = line_terminator
dialect["lineTerminator"] = line_terminator
print(dialect.line_terminator)

We're currently working on https://github.com/frictionlessdata/frictionless-py/issues/1015 to enable Delimiter setters.

And I'm looking forward to simplifying Metadata classes especially Dialect - https://github.com/frictionlessdata/frictionless-py/issues/944

I'm closing this umbrella issue as I think it's now covered by the smaller ones. Please let me know if I'm missing something else