allenai / scifact

Data and models for the SciFact verification task.
Other
215 stars 24 forks source link

The rationale_roberta_large_scifact does not work #2

Closed EdwardZH closed 4 years ago

EdwardZH commented 4 years ago

I follow the pipeline and found errors when I use rationale_roberta_large_scifact. I think the checkpoint is not pytorch form and miss vocab file.

PeterL1n commented 4 years ago

Are you using our provided script or are you trying to load the model yourself?

EdwardZH commented 4 years ago

Sure, I use the provided scripts. Only the rationale selection part with RoBERTa lager does not work, and then I replace it to sciBERT it works.

PeterL1n commented 4 years ago

I have double-checked that the script runs fine for RoBERTa Large on my side. What GPU are you using? Maybe the GPU runs out of VRAM for RoBERTa Large model? If there are more issues I will need to know what's the exact error message it produces.

EdwardZH commented 4 years ago
"Unable to load weights from pytorch checkpoint file. "

OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. terminate called after throwing an instance of 'c10::Error' what(): owning_ptr == NullType::singleton() || owningptr->refcount.load() > 0 INTERNAL ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:348) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f846cc53193 in /home/liuzhenghao/.conda/envs/fever/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0x18cd59f (0x7f83a00c259f in /home/liuzhenghao/.conda/envs/fever/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #2: THStorage_free + 0x17 (0x7f83a088aba7 in /home/liuzhenghao/.conda/envs/fever/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #3: + 0x55d4dd (0x7f8473cf94dd in /home/liuzhenghao/.conda/envs/fever/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #21: __libc_start_main + 0xf0 (0x7f8477fdb830 in /lib/x86_64-linux-gnu/libc.so.6)
PeterL1n commented 4 years ago

I have no idea why would you have that error.

I created this temporary Colab notebook to show how it is working on my side. https://colab.research.google.com/drive/1QXS901zB65pGO5cF4QvDs8O1fH4x10Kk?usp=sharing

EdwardZH commented 4 years ago

I download checkpoint again. It works. Thank you for your help.