Strip citations from text

ajkessel commented 10 years ago

I realize this is a rather niche request, but I figured why not put it out there in the spirit of open source. I read a lot of documents with many case and statutory citations that are basically useless in speed-reader form. I suspect people in academia would have a similar issue. I would like to suggest an option to strip citations from the text as part of the parsing process. I tried to track down some regular expressions that match typical citation forms and came up with these:

http://stackoverflow.com/questions/7054764/regular-expression-for-legal-citation http://pyparsing.wikispaces.com/share/view/10397114

The pyparsing example appears to be a good starting place but would need to be implemented in JS. Alternatively, maybe someone out there has already solved this precise problem (but I couldn't find it).

Another lower tech option would be to have an option to strip text matching any arbitrary number of patterns in the configuration interface--this would save the user from having to tangle with the source code.

nomicode commented 10 years ago

Having used jetzt to read Wikipedia a fair bit, I am :+1: on this change.

ds300 commented 10 years ago

Thanks, this is an excellent idea! I think it is very possible to incorporate custom regex filters into the new parsing stuff without impacting all the good DOM analysis.

j6k4m8 commented 10 years ago

I'm a huge fan of the idea of using this for academia. I'd love to work on an optional feature set to allow academic reading. It'd have to be able to lower the speed to 100 - 300 WPM, and pause for longer amounts of time on abbreviations and long, uncommon words.

Any other takers?

nomicode commented 10 years ago

That'd be great. Even more reason to get #78 working?

ds300 / jetzt

Strip citations from text #98