Closed ca16 closed 2 years ago
Overall, it looks good to me. Just curious, maybe I missed this part, is there any testing has been done to make sure that missing the key "paper_id" will not crash anything downstream? Maybe it is part of the pytest?
@Mosqidiot I put some notes about some testing I did around this for the old reader/scholarphi reader here: https://github.com/allenai/scholar/issues/31778#issuecomment-1104382940. Things seemed okay to me (though @kyleclo thinks we might not even necessarily care about that now that we have the new reader so maybe that doesn't matter at all).
For the new reader, making this change alone will not affect what the frontend gets from s2airs. I've split up the remaining work in roughly two:
FYI @kyleclo @andrewhead
Related to https://github.com/allenai/scholar/issues/31778, builds on https://github.com/allenai/scholarphi/pull/351.
The idea is to include output for bib entries that we failed to match to S2 papers in the output of the citations pipeline, so that we can still identify the corresponding mention bounding boxes (and possible show the bib entry text n them instead).
Testing
Running examples
I tested this out by running a couple of papers through (same as the ones for #351).
A paper missing matches, without these changes
Output file: 1611.07004v3-current-2.txt
Logs:
Mentions in output:
A paper missing matches, with these changes
Output file: 1611.07004v3-with-missing-matches-2.txt
Logs:
Mentions in output:
A paper not missing matches, without these changes
Output file: 2009.12303v4-current-2.txt
Logs:
Mentions in output:
A paper not missing matches, with these changes
Output file: 2009.12303v4-with-missing-matches-2.txt
Logs:
Mentions in output:
Automated tests
I also ran the tests described here: