Closed griff4692 closed 1 year ago
Hi,
I'm sorry for the delay responding. I've added a section to the README covering this, see here: https://github.com/dwadden/multivers#making-predictions-for-new-datasets.
Let me know if you've still got questions!
Dave
Re your question about feeding generated abstracts line-by-line, this an interesting question. Seems like there's two obvious ways to do this:
I can imagine a couple ways to maybe make the "split-then-verify" approach a bit better.
Let me know if you end up trying any of this, I'm curious to see if it works!
@griff4692 are you good for me to close this?
@griff4692 are you good for me to close this?
Yes - sorry for not replying to your comment.
(If interested, I can do a pull request with a simple Python class I wrote at some point which allows for single sentence scoring)
No worries, thanks for closing. And sure, I'd gladly accept a PR to do single sentence scoring!
Hey David -
Thanks for this repo (it's very legible and easy to read - I also like PLightning!)
If I want to very quickly run predict a label for text A given text B, is there a script to get raw text data in the right format? It seems the predict script relies on specific input files for datasets, but I just want to use it on new text pairs. (the use case is factuality verification of generated scientific abstracts, given the body of the article). I might have to feed each abstract sentence-by-sentence to make it more in line with the SciFact hypotheses.
Make predictions using the convenience wrapper script [script/predict.sh](https://github.com/dwadden/multivers/blob/main/script/predict.sh). This script accepts a dataset name as an argument, and makes predictions **using the correct inputs files** and model checkpoints for that dataset.