Open emanuil-tolev opened 10 years ago
Here's an example of one potentially problematic case:
<a href="http://pubs.acs.org/page/policy/authorchoice/index.html" title="Learn more about ACS AuthorChoice">
And here's an example of something going wrong: http://oag.cottagelabs.com/lookup/10.1210/en.2012-1913
10.1210/en.2012-1913 ++ Free to Read (free-to-read)
License decided by scraping the resource at http://press.endocrine.org/doi/abs/10.1210/en.2012-1913 and looking for the following license statement: "".
BY: null. NC: null. SA: null. ND: null. OKD compliant? undefined. OSI compliant? undefined
Learn more about this license at undefined
We retrieved this information from http://press.endocrine.org/doi/abs/10.1210/en.2012-1913.
Last checked on 2014-04-13T21:01:53Z.
License detected by generic_string_matcher 0.1 plugin
In the short term I've just deleted all of these from the flat file of licenses along with all the free-to-read statements. That seems to clear up most of the issues.
This should, theoretically, be enough for statements which only contain HTML. It won't help much when the statements contain a couple of letters or a single word though. So additionally a length check (10 chars?) and a only-1-or-2-words-is-unacceptable check could be added (only when HTML is stripped, which is only when an exact match fails).