The CSV file format used for schedules is bad

chris-belcher commented 5 years ago

The CSV file format used in the tumbler schedule has a few problems.

It's not extensible (adding or removing fields requires you to comb the entire source to check if you need to change any list indicies). Textual search doesn't help as much as it could.
It's not readable in the source code (fields are referenced by number indicies which doesn't tell the reader anything about what they are). Better would be that schedule entries are held in a class, so the code reads like schedule_entry.amount_fraction or schedule_entry.counterparty_count rather than schedule_entry[4]. We have the ability to name variables for a reason. This class could also have serialize() and deserialize() methods which read/write the object to file which can use any format.
It's not readable in the csv file. The fields have no labels. Different field lengths can cause the difficult columns to not line up (for example if one amount_fraction is 0.5 and another is 0.454556756 then the latter will push the whole column to the right). Dots can be confused with commas too.
It doesn't have versioning to detect upgrades or downgrades. It's not as trivial to add versioning with CSV because every line is parsed with string.split(",") so versioning adds an edge case.

I think the CSV file format was aiming to be easier to edit than the JSON that was previously used, but CSV is still quite hard to edit for newbs. Perhaps we're better off displaying a GUI table in joinmarket-qt.py that users can edit if they want to tweak the schedule themselves. Users of the command line tumbler can probably deal with editing without a GUI.

AdamISZ commented 5 years ago

I think the CSV file format was aiming to be easier to edit than the JSON that was previously used

Yep, that's all. I just don't think json is something you can ask someone to edit manually (even if a techy person would be fine doing it), whereas there's a long history of using CSV for manually read/writeable data (see: Excel).

Having said that, nobody ever really wants to manually edit files like this, and also, this is not actually a data file, really, it's more like the most rudimentary conceivable programming/scripting system.

Re: versioning, you can easily add it here by having a special magic match on the first line for new versions (and all succeeding ones) and then just have the lack of that special string/line indicate version 0.

AdamISZ commented 5 years ago

It's not readable in the csv file. The fields have no labels. Different field lengths can cause the difficult columns to not line up (for example if one amount_fraction is 0.5 and another is 0.454556756 then the latter will push the whole column to the right). Dots can be confused with commas too.

I mean it's kind of a side thing, but I don't agree with this, this is handled easily by many programs that can read CSV.

AdamISZ commented 1 year ago

Reflecting on this just now, it seems like: csv <-> list in Python is bad because it's horrible for coding, so why not just have it be an object in code (glorified dict basically), but have it be serializable to address the question of users easily editing a plaintext csv file.

PulpCattel commented 1 year ago

I think, if we are not already, one thing we can do is start using https://docs.python.org/3/library/csv.html and https://docs.python.org/3/library/csv.html#csv.DictReader

JoinMarket-Org / joinmarket-clientserver

The CSV file format used for schedules is bad #429