Open holub008 opened 3 months ago
This is trivially fixed by making the space following the -
after each tag optional, as follows:
class Issue62Override(rispy.RisParser):
PATTERN = r"^[A-Z][A-Z0-9] - ?|^ER -\s*$"
out = rispy.loads(test_ris_str, implementation=Issue62Override)
out[0]['number'] # '9'
So the question is really if there's interest in this being universal in RisParser. For what it's worth, we maintain an internal test suite of ~30 files from various providers and this change broke none of our assertions while correcting the issue. I haven't run against rispy
's test suite though.
I like this idea, would accept a PR if you have time @holub008!
Back with another spec corner case-- the below truncated example comes from our friends at Embase:
As you can see, the empty
SP -
tag is detected as a wrap of theIS
tag, which is not what the RIS writer intended.Any thoughts on recognizing (and most probably discarding) empty tags like
SP
here?It's difficult because detecting & keeping line wrap is extremely useful (see in this same record, with the abstract in
N2
being wrapped), and it's possible, though relatively, unlikely that a legitimate wrapped line could conflict with the RIS tag format.