370-Alexa-Project / CS370_Echo_Demo

CS 370 - Amazon Echo project template code and details
MIT License
1 stars 2 forks source link

Strip HTML tags from calendar fields #16

Closed ghost closed 6 years ago

ghost commented 8 years ago

Certainly this is already a solved problem, we must simply find the code that already exists to do this. A robust solution would be non-trivial to implement from scratch.

This could be a stored procedure that runs on the database:

I will look into what language extensions can be added on RDS for making this task more approachable, but there are probably existing solutions in PL/pgSQL.

ghost commented 8 years ago

Amazon RDS supports four language extensions: Perl, pgSQL, Tcl, and JavaScript (via the V8 JavaScript engine).

https://aws.amazon.com/rds/postgresql/

eBucher commented 8 years ago

In addition to this, it looks like we may need need to replace &s with the word "and". Currently, any names with ampersands in them cause Alexa's response to not work.

ghost commented 8 years ago

Thanks to input from Jessica and Aaron, the new, cleaner strategy is to perform this transformation in the scraper and discard the original descriptions from the calendars, storing only the stripped versions instead. This reduces latency to a minimum in the Alexa skill, is optimally space efficient in the database, and scales cleanly in the scraper.

Here's a checklist to help track the progress of this issue:

ghost commented 8 years ago

We currently have no plans to use the fields that contain HTML, so I'm benching this issue.