cslarsen / wpm

Typeracer-like console app for measuring your WPM
GNU Affero General Public License v3.0
322 stars 48 forks source link

Rewrite entire quote database from scratch #26

Open cslarsen opened 6 years ago

cslarsen commented 6 years ago

I would like to go back to the original sources and write down every single quote again, from scratch.

As it is now, nearly every quote contains some kind of error/rewording/canary. Also, some of the sources could be more explicit. For example, for very old texts, it would be very nice to note which translation has been used.

To do this, I will need help. Please put it in a JSON format, but add additional information in new tags that are not recognized by WPM (they should all be optional, making WPM ignore them).

So you have

{
 "author": "...",
 "title": "...", # only canonical title here, short and nothing else
 "text": "...",
 "translation": "translation info",
 "copyright": "if applicable",
 "edition": "...",
 "url": "if applicable",
}

If you don't have anything to put into the optional fields, then leave them out. It would be very good with a URL to e.g. google books (see below) so things can be double-checked.

The required fields are author, title and text.

How to actually find the quotes?

Details on submitting new quotes

gauravsofat commented 6 years ago

Hey, I'd like to contribute and help resolve this issue. Where can I find the quotes currently being used which need to be rewritten? Are they the same ones as stored in this file?

cslarsen commented 6 years ago

Yep, those are the ones. Just gunzip it and you can edit the JSON file. It should be under a new name, though (+ probably under a separate branch as well). This is a lot of work, but any help would be greatly appreciated.

If needed, I can code up some tools to check which quotes are missing from the new one (or something like that).

gauravsofat commented 6 years ago

Hey, as mentioned above, I have created the rewritten.json in the specified fashion. I wrote up a small script to convert the previous database to the format requested, with the exact same data for now. So essentially, every sub-array in the examples.json.gz of the from ["author-name", "title-name", "text-content", text-id] has been converted to -

{
 "author": "author-name",
 "title": "title-name",
 "text": "text-content".
 "text_id": text-id
}

This should serve as a good base to start the rewriting process. If you wish, I can make a PR for this commit to the main repository so that it is easy for anyone contributing to track the amount of work left and contribute in bits and pieces to the database of almost 5k entries.

Just wanted to run this by you, to make sure that this is the right way to go ahead.

cslarsen commented 6 years ago

Thanks for the effort, but I would prefer to have rewritten.json only contain the rewritten, clean quotes. This because of merging, diffing and so on.