ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Change provenance file to JSON format #184

Open dbrower opened 3 years ago

dbrower commented 3 years ago

The carrel provenance file is formatted as a tab-seperated-values file. This causes problems with adding or changing metadata fields. It also causes problems with carrels whose search queries contain newlines.

There are many alternate formats we could use. I suggest JSON since it is ubiquitous and is intended as a structured way to send data from one program to another. Most languages have a library to read json files, and there is a command line tool (jq) to pull data out of a JSON formatted file.

This ticket is to change the file to have JSON format.

ericleasemorgan commented 3 years ago



I’m not sure it is a good idea to implement this feature right now. Changing the format of the provenance file has a great deal of downstream consequences, and I’m not sure it is a good idea so close to release of our version 1.0.

On Jun 21, 2021, at 12:43 PM, Don Brower @.***> wrote:  The carrel provenance file is formatted as a tab-seperated-values file. This causes problems with adding or changing metadata fields. It also causes problems with carrels whose search queries contain newlines.

There are many alternate formats we could use. I suggest JSON since it is ubiquitous and is intended as a structured way to send data from one program to another. Most languages have a library to read json files, and there is a command line tool (jq) to pull data out of a JSON formatted file.

This ticket is to change the file to have JSON format.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

dbrower commented 3 years ago

I made this ticket to track the problem since I noticed some errors related to it in web gui's carrel scraper. The ticket is not saying when it (or whether) the issue should be addressed. Your concerns are noted, and this should wait until after the release.