dcmoura / spyql

Query data on the command line with SQL-like SELECTs powered by Python expressions
https://spyql.readthedocs.io
MIT License
918 stars 25 forks source link

Universal access via a dot operator #79

Closed dcmoura closed 2 years ago

dcmoura commented 2 years ago

This PR introduces the dot operator for universal access to the fields of different files formats.

Consider the following files:

data.csv

a, b
1, 2
3, 4

data.json

{“a”: 1, “b”:2}
{“a”: 3, “b”:4}

Now we can use the same syntax to query from json and csv:

SELECT .a + .b FROM json
SELECT .a + .b FROM csv

Still, the following versions are more CPU efficient and should be considered when dealing with large files, especially in the CSV case:

SELECT json->a + json->b FROM json
SELECT a + b FROM csv

In the JSON case, using a lookup operator is faster than using an attribute access operator (this might be improved though). In the CSV, it envolves creating a dictionary for each row (improving this would require changing the way we read CSVs and it would always be slower than reading each line as a list - as we do today).

This functionality was already available using the row keyword, the dot operator just makes it more practical and reduces clutter. Under the hood, .a is replaced by row.a before processing the query. This is done via regex, which is tricky (there might be some corner cases that I overlooked).

I am planning on doing a silent update (with no support on the README) since this is being addressed in the new documentation that should be released soon.

codecov[bot] commented 2 years ago

Codecov Report

Base: 95.63% // Head: 95.63% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (e40cba3) compared to base (09c311b). Patch coverage: 100.00% of modified lines in pull request are covered.

:exclamation: Current head e40cba3 differs from pull request most recent head e3fe384. Consider uploading reports for the commit e3fe384 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #79 +/- ## ======================================= Coverage 95.63% 95.63% ======================================= Files 15 15 Lines 1282 1284 +2 ======================================= + Hits 1226 1228 +2 Misses 56 56 ``` | [Impacted Files](https://codecov.io/gh/dcmoura/spyql/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Daniel+Moura) | Coverage Δ | | |---|---|---| | [spyql/parser.py](https://codecov.io/gh/dcmoura/spyql/pull/79/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Daniel+Moura#diff-c3B5cWwvcGFyc2VyLnB5) | `99.46% <100.00%> (+<0.01%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Daniel+Moura). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Daniel+Moura)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.