data-liberation / data-liberation-resources

liberate all kinds of data from PDF and other unstructural format and make the information machine-readable and visualizeable for popular tools.
27 stars 10 forks source link

记者新闻 #4

Open wanghaisheng opened 6 years ago

wanghaisheng commented 6 years ago

https://reporterslab.org/tech-and-check-alerts-student-coders-duke/ 事实核实/核查

wanghaisheng commented 6 years ago

https://docparser.com/blog/extract-data-from-pdf/ https://www.kofax.com/ https://xtracta.com/other-documents-capture-api/ https://pdf.zanran.com/extract-table-from-pdf

wanghaisheng commented 6 years ago

https://www.addressextractor.net https://parseur.com/ https://taskpipes.com/ https://www.simpleindex.com/Video/ https://www.helpsystems.com/products/document-and-data-capture

wanghaisheng commented 6 years ago

https://github.com/Crossref/pdfextract https://github.com/HazyResearch/pdftotree https://github.com/kermitt2/grobid https://github.com/allenai/science-parse

wanghaisheng commented 6 years ago

http://deepdive.stanford.edu

wanghaisheng commented 6 years ago

Understanding the interactions of small chemicals or drugs in the body is key for drug discovery. However, the majority of this data resides in the biomedical literature and cannot be easily accessed. The Pharmacogenomics Knowledgebase (PharmGKB, www.pharmgkb.org) is a high quality database that aims to annotate the relationships between drugs, genes, diseases, genetic variation, and pathways in the literature. With the exponential growth of the literature, manual curation requires prioritization of specific drugs or genes in order to stay up to date with current research. In collaboration with Emily Mallory (PhD candidate in the Biomedical Informatics training program) and Prof. Russ Altman at Stanford, we are developing DeepDive applications in the field of pharmacogenomics. Specifically, we use DeepDive to extract relations between genes, diseases, and drugs in order to predict novel pharmacological relationships.

wanghaisheng commented 6 years ago

https://www.pdftron.com/blog/parsing-extraction/table-extraction-and-pdf-to-xml-with-pdfgenie/#pdf-liberation-hackathon

wanghaisheng commented 6 years ago

https://techblog.gumgum.com/articles/text-extraction-using-dragnet-and-diffbot

wanghaisheng commented 5 years ago

Extracting campaign finance data from gnarly PDFs using deep learning http://jonathanstray.com/extracting-campaign-finance-data-from-gnarly-pdfs-using-deep-learning