duckduckgo / zeroclickinfo-fathead

DuckDuckGo Instant Answers based on keyword data files
https://duckduckhack.com/
Other
318 stars 365 forks source link

legal_docs: remove internal links form the abstract text #99

Closed jdorweiler closed 7 years ago

jdorweiler commented 10 years ago

There's a ( showing up at the end of the abstract text. Just need to update the parser to remove the internal links inside the (). selection_196

Here's the source data for this entry as an example.

Document description: Bylaws for Delaware Corporation is an open source legal document (<a href="http://www.docracy.com/doc/showalltagged?tag=governance">governance</a>, <a href="http://www.docracy.com/doc/showalltagged?tag=startup">startup</a>, <a href="http://www.docracy.com/doc/showalltagged?tag=delaware">delaware</a>, <a href="http://www.docracy.com/doc/showalltagged?tag=incorporation">incorporation</a>).&nbsp;<a href="http://www.docracy.com/sign/usedoc?signing=false&docId=45">[download]</a>    http://www.docracy.com/45/bylaws-for-delaware-corporation
fern4lvarez commented 10 years ago

This is something to fix on the data source, since the endpoint (http://www.docracy.com/application/duckduckgo) serves the data already parsed.

/cc @rpicard (who is the IA's author)

rpicard commented 10 years ago

I wasn't the author actually. I was an intern at DDG and worked on the Fatheads. The attribution is here: https://github.com/duckduckgo/zeroclickinfo-fathead/blob/master/lib/DDG/Fathead/LegalDocs.pm

rpicard commented 10 years ago

@megamattron

fern4lvarez commented 10 years ago

:+1:

jdorweiler commented 10 years ago

Thanks @rpicard and @fern4lvarez
If we don't hear back from @megamattron it would be easy enough to fix in the parser.