DeanLight / spannerlib

https://deanlight.github.io/spannerlib/
Apache License 2.0
2 stars 0 forks source link

Add tests #116

Closed asmaamagdoub closed 9 months ago

asmaamagdoub commented 9 months ago

In this PR, we have added unit tests and e2e tests under nbs/tests/ . There are two test files, test_corenlp and test_regex, that are related to nlp and rust_spanner_regex, so we have deactivated them. Also we have made some changes:

  1. Corrected assumptions about span_start and span_end: Previously, we assumed 0 <= span_start < span_end, but this assumption was incorrect.

  2. Changed some file paths: As the tests now run under nbs/tests/, we had to adjust paths to certain files. These changes include the paths to the stanford-corenlp zip file and grammar.lark and the path to enum_spanner_regex folder in the following files: session, utils, nlp, and rust_spanner_regex.

review-notebook-app[bot] commented 9 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

DeanLight commented 9 months ago

tests should not be in their own folder, this violates the concept of locality. If possible, without hurting readability, tests should be written as close to the code they test.

please look at how nbdev2 explains where to write tests in their site (and how tests can be used as examples).

asmaamagdoub commented 9 months ago

I disagree with the suggestion for these reasons:

In our case, we have about 150 cells of tests. If we want to place them between the main code cells, it would create significant gaps between different parts and classes of the project code, making it harder to read.

We also have so many e2e tests that don't fit well under a specific code!

Also there are tests designed to evaluate specific parts but require the use of the session to be run. But in the file being tested, there is no access to the session.

Plus nbdev themself have a separate folder for tests, which you can find here: https://github.com/fastai/nbdev/tree/master/tests.

Adding to that, this will make solving the problem of the huge cells much more difficult, because the solution is to separate class methods into different cells. For instance, when applying this to abstract classes, we can't insert the tests in between the cells because the tests require an instance of the class, which can't be created before all the functions are patched. As result the code would look like this:

Class A

Patched func

Patched func

Patched func

Patched func

Patched func

Test1

Test2

Test3

Test4

Which also doesn't support the locality principle.

DeanLight commented 9 months ago

You raise some good points, alot of which dont necessarily conflict with my suggestion. Ill reply to them bellow:

In our case, we have about 150 cells of tests. If we want to place them between the main code cells, it would create significant gaps between different parts and classes of the project code, making it harder to read.

A good way to overcome this, is to put tests under their own headers in jupyter notebooks you can collapse headers and there fore enjoy both views, with functions next to each other when test headers are folded and with tests next to what they are testing when unfolded.

We also have so many e2e tests that don't fit well under a specific code!

True, these can go into their own notebooks that only contain tests that require all the session code etc... Its fine to have e2e or integration "test notebooks" but unit tests etc should still be local.

Also there are tests designed to evaluate specific parts but require the use of the session to be run. But in the file being tested, there is no access to the session.

This means that they are not unittest, but rather integration tests/e2e tests. In some cases this is necessary and they should remain e2e tests, in some cases it might be because this tests were written in a quick and dirty way but we could actually test the specific parts in isolation, in which case it would be better to have the tests be isolated. I live it to your judgement which is which.

Plus nbdev themself have a separate folder for tests, which you can find here: https://github.com/fastai/nbdev/tree/master/tests.

True, however the nbdev codebase itself is not an idiomatic example of how to use nbdev since they have alot of bootstrapping considerations that affect code placement. Look at fastai as a more idiomatic example of how nbdev was intended to be used.

Adding to that, this will make solving the problem of the huge cells much more difficult, because the solution is to separate class methods into different cells. For instance, when applying this to abstract classes, we can't insert the tests in between the cells because the tests require an instance of the class, which can't be created before all the functions are patched. As result the code would look like this:


Class A

Patched func

Patched func

Patched func

Patched func

Patched func

Test1

Test2

Test3

Test4

Which also doesn't support the locality principle.


This example is actually fine, it is not the most local possible (due to the constraints you raised) but its still much more local than different files.

Also, if there are some cases in which classes are huge unnecessarily, and some logic can be factored out and tested seperately and then used in the class via composition that is usually preferable. If you want we can look for some examples in huge classes in our codebase together.

Here is an example:

_helper_func

_helper_func_test

_helper_func2

_helper_func2_test

class shorter_func(): ... _helper_func ... shorter_func2(): ...

class_test1 (assumes that all helper funcs are correct and doesnt test them)

class_test2



I do agree that there are a-lot of subtleties in these kinds of decisions and its good that you are raising these concerns. Let try to converge on these points in light of specific areas of the codebase in our next meeting.