CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
217 stars 34 forks source link

Add more complete testing to the spanner extract test suite #104

Open BryanCutler opened 4 years ago

BryanCutler commented 4 years ago

Currently only a simple test case exists. As per comments at https://github.com/CODAIT/text-extensions-for-pandas/pull/83#discussion_r474330942, more tests need to be added to exercise the function completely.

Fred's comments on text_extract_dict:

I'd recommend that you remove the last three lines of the current file and replace "file_text" below with a string that exercises the major cases of dictionary extraction:

You'll also want to exercise case-insensitivity of the dictionary matching.

I think the location of this file is an anachronism. Would you mind moving it to test_data/spanner?

comments on test_extract_regex_tok:

As with the dictionary test, it would be useful to have a target string that contains the main types of regex match -- matches at the beginning, middle, or end of the string; partial matches; substrings that would be matches except they don't start or end on a token boundary.