Open ivbeg opened 2 years ago
Fantastic idea! No timeline yet on implementation, but definitely a very useful feature. I've run into this myself :)
Actually @ivbeg, would you be able to describe your ideal interface for such a feature? Would the program run the query over each json line individually, or treat the whole file as a large array?
@evinism It would be great to support both ways to process JSON lines files, but streaming feature would be more important since there are huge JSON lines files, up to 100GB+ compressed. I could provide several examples from public datasets if needed. It's nearly impossible to process such files as a large array.
I've developed cmd tool undatum (https://github.com/datacoon/undatum) that support data processing and conversion of JSON lines and BSON files. BSON is a binary format used by MongoDB NoSQL database, very similar to JSON lines . So I would like to integrate query language into undatum to use it with data processing/conversion operations. I've already used dictquery (https://github.com/cyberlis/dictquery) but it's good for filtering only.
streaming mode for processing jsonl sounds right to me too. Not sure when I'll get to this, but definitely something I want to tackle.
@evinism I've added experimental support of mistql to undatum, it's supported in main https://github.com/datacoon/undatum version 1.0.13 command "undatum query -q \<yourquery> \<filename>" filename could be csv, jsonl or bson.
I hope it could help.
Adding @ilan-pinto to this thread. For now, let's work on getting this up and running in Python.
Hi please assign it to me
For reference, a possible interface for this feature could be as such:
tail file.log | python -m mistql.cli foo.bar --lines > processed.jsonl
Note that the query is performed in a streaming manner -- for each JSON line in file.log
, the CLI spits out the query result for that line in processed.jsonl
Please add support of JSON lines files https://jsonlines.org/ There are a lot of such files published and used. Sometimes they are huge and hard to convert to JSON