IvanMathy / Boop

A scriptable scratchpad for developers. In slow yet steady progress.
https://boop.okat.best
MIT License
3.82k stars 352 forks source link

CSV to JSON is too fragile #127

Open sunnywalker opened 4 years ago

sunnywalker commented 4 years ago

The CSV to JSON script is too fragile and breaks on real-world data such as double-quoted columns to contain commas and double-double-quotes for literal double-quotes.

I had a \r in some of my data which means the script isn't standardizing to \n line endings.

Test:

Screen Shot 2020-07-20 at 15 44 11

Result:

Screen Shot 2020-07-20 at 15 44 29

Expected:

Screen Shot 2020-07-20 at 15 47 24

The JSON to CSV gets similarly confused in that it uses \" for literal double-quotes instead of the CSV convention of double-double-quotes within a quoted column.

IvanMathy commented 4 years ago

Hey there! Thanks for the report, I'll try to either fix the current code or try to find a more solid library for conversion. I have to admit this isn't a script I use a lot personally so it's hard to see shortcomings... I appreciate the details and will look into the conventions for converting between the two formats.

Flare576 commented 4 years ago

[Off-Topic, but context: I'm attempting to make a JSON->YAML converter script and noticed this Issue; @IvanMathy's comment about "more solid library" for conversions hits home with the problem I'm facing; external libraries.]

The best library for converting CSV to JSON I've found is https://www.papaparse.com/ . I've tried creating a script like this:

const Papa = require('papaparse');

/**
    {
        "api":1,
        "name":"(beta) Convert CSV To JSON",
        "description":"Changes CSV to JSON",
        "author":"flare576",
        "icon":"quote",
        "tags":"boop,state,script,csv,json,convert"
    }
**/

function main(state) {
  try {
    state.text = Papa.parse(state.text);
  }
  catch(error) {
    state.fullText = `${state.fullText}\n\n${error}`;
    state.postError("Something strange happened here...");
  }
}

But I get the error

TypeError: undefined is not an object (evaluating 'Papa.parse')

Which makes sense: it's not one of the included libraries laid out in your Modules docs here...

So, what's the suggested process here? Feel free to "RTFM" with a link :D

Flare576 commented 4 years ago

For what it's worth: I grabbed the minified version of Papa Parse and pasted it straight into my csv_to_json.js, and it did work. I did have to take out the require and fix the code to be

function main(state) {
  try {
    const { data } = Papa.parse(state.text);
    state.text = JSON.stringify(data);
  }
  catch(error) {
    state.postError("Something strange happened here...");
  }
}

If the pattern Boop wants to follow is to have all dependencies in the lib folder, then creating a new file in there for Papa (with its LICENSE, of course) is probably the cleanest way, but before I throw up a PR I wanted to double-check that this is the approach you want.

Flare576 commented 4 years ago

Put up a PR that (I THINK) is aligned with the Boop docs:

If you'd like some helper functions, consider extracting only the part you need from the library (and clealy mention where it comes from alongside the license) rather than pasting the full source in your script.

https://github.com/IvanMathy/Boop/blob/main/Boop/Documentation/CustomScripts.md#performance

The entire PapaParse library was 20k. Not sure how this compares to other "helpers", so I opted to start "Fast and easy". If this is too large, there are components (mostly around file read/write) that we don't need of the library, but removing them piecemeal isn't my idea of a great time (and probably won't save much space considering the maturity of PapaParse), so I left it for now.

Flare576 commented 4 years ago

Sorry for the double-post, but I just realized that there's a combination of an assumption and a "Limitation" at play around the "header" stuff in this script.

First, we're assuming that this data has a header (both in the original and in my fix). There's a "Limitation" (and I use that term VERY loosely because I actually see this as a feature/simplicity) in Boop that you can't pass parameters to the scripts.

So... the solutions I see are:

  1. Add another script and define one as CSV to JSON (headers), the other CSV to JSON (headerless)
  2. 🤮 Add some sort of design where the first line of state.text could be settings for a script 🤮
  3. ❓ Auto-detect header rows ❓ (I have NO idea how to do this and am pretty sure it's a fool's errand)

If we went with # 1, I'd argue that the 2nd script would just be an optional script, and the default would be headers