fastforwardlabs / ff14_blog

Apache License 2.0
12 stars 4 forks source link

Building a QA System with BERT on Wikipedia | NLP for Question Answering #3

Open utterances-bot opened 4 years ago

utterances-bot commented 4 years ago

Building a QA System with BERT on Wikipedia | NLP for Question Answering

A high-level code walk-through of an IR-based QA system with PyTorch and Hugging Face.

https://qa.fastforwardlabs.com/pytorch/hugging%20face/wikipedia/bert/transformers/2020/05/19/Getting_Started_with_QA.html

pbisaria007 commented 3 years ago

How to train BERT on my dataset and predict the answers and then evaluate the model?

imranq commented 3 years ago

This was really helpful for my own project. Is there a way to know where the answer is in the wikipedia page? Thanks!

melaniebeck commented 3 years ago

@imranq Great to hear that you found this useful! In our main question answering repo we built WikiQA, a visual interface with Streamlit, that includes answer highlighting of the Wiki page. You can check it out here and spin up a version of your own, if you like.

imranq commented 3 years ago

Thank you so much for the quick response! That looks like a nice repo to learn more.

On my initial trial, for some reason, after the initial loading of "Why is the sky blue?", no other text input seemed to generate results. Also looking at the highlight text function in wikiqa.py, I think that function would only highlight the first time the answer appears in the text, not necessarily where the model found it.

AzitaMalek commented 3 years ago

Hi, Thanks for the step by step explanation. I am trying to run the colab version. However when I get to "!python run_squad.py \" I get an error that "File "run_squad.py", line 1 404: Not Found". I have already ran the cell that has "!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/question-answering/run_squad.py" ...So, I am not sure what I should do to overcome this error? Would you please help with that?

melaniebeck commented 3 years ago

Hi @AzitaMalek -- Good catch here! This notebook uses an older version of HuggingFace Transformers and the correct version was not pinned in some cells. These cells have been updated, including with the correct link to a version-compatible run_squad.py (https://github.com/huggingface/transformers/blob/b90745c5901809faef3136ed09a689e7d733526c/examples/run_squad.py)

littlethumb123 commented 3 years ago

Thanks for sharing the experience! Very helpful and informative!

XingYiBao commented 2 years ago

Very good article! may i know if the code can run under Python 3.7.6 + PyTorch 1.9 + Transformers 4.10 + CUDA11.1 ? Many thanks!!!

youngsunjang commented 2 years ago

Hi @AzitaMalek and @melaniebeck, Now it seems the link also got some error. I think we could use the link below. https://raw.githubusercontent.com/huggingface/transformers/master/examples/legacy/question-answering/run_squad.py How do you guys think?

XuJianzhi commented 2 years ago

nice !

vifirsanova commented 1 year ago

Thank you! This tutorial (along with all the other publications in this blog btw) is amazing!

Unfortunately, training BERT in SQuAD with free Colab is hard due to RAM limits (which is mentioned here), but we can shorten SQuAD and use DistilBERT, this helps if we want to test the run_squad.py by ourselves