JHU-CLSP / turking-bench

Web-grounded natural language instructions
https://turkingbench.github.io
Apache License 2.0
13 stars 6 forks source link

"Reading comprehension" URLS #33

Open danyaljj opened 1 year ago

danyaljj commented 1 year ago

Consider downloading the content of the URLs in "reading comprehension" tasks so that we don't lose them in future.

klxu03 commented 1 year ago

Would you consider downloading all the sites as a PDF and then embedding a PDF viewer onto the site saying "If you cannot access this site, read the PDF version instead"?

If you want to maintain text for easier parsing then it's possible to also just copy paste the main content for each article and paste it in mostly unformatted in a similar "alternative access box"