Create simple unit tests for API

ghost commented 12 years ago

We need a set of unit tests to see if all API calls work, or not. It's impossible to keep testing all of them each time there is a change. And it can be rather annoying if an API call fails.

ghost commented 12 years ago

I'm very happy with [[url:http://twill.idyll.org/|twill]], though I haven't used yet its form-testing features.

[[url:http://twill.idyll.org/|twill]] is a high level framework for testing websites. Pros: doesn't need a browser. Cons: can't test javascript.


$ twill-sh
    >> go http://tabasco.upf.edu
    ==> at http://tabasco.upf.edu
    current page: http://tabasco.upf.edu
    >> code 200
    current page: http://tabasco.upf.edu
    >> code 401
    ERROR: code is 200 != 401
    current page: http://tabasco.upf.edu
    >> 
    current page: http://tabasco.upf.edu
    >> find ''
    current page: http://tabasco.upf.edu
    >> notfind 'Page not found'
    current page: http://tabasco.upf.edu
    >> showforms
    Form #1
## ## __Name__________________ __Type___ __ID________ __Value__________________
    1     q                        text      (None)        
    2  1  None                     submit    search_s ...

Looks like it can be run against a Django site via WSGI. That means you don't need to setup a web server to run your tests. ([[url:http://blogs.translucentcode.org/mick/2006/02/26/basic-twill-intercept-testing-django/|See here).

ghost commented 12 years ago

For the API I don't need to test javascript right? Twill sounds good to me.

ghost commented 12 years ago

Django unit testing framework also has a Client class actually, now that I think about it, and it's already baked into Django so you don't need another dependency:

http://docs.djangoproject.com/en/dev/topics/testing/?from=olddocs#module-django.test.client

ghost commented 12 years ago

I just started working on the unit testing ticket. I tried the django testing framework and it seems to work fine for what I need. It returns the api data in one single string but I suppose that i'll be able to use a json decoder so i'lll be able to manage data easily. I'll write a script with tests for all functionalities of the API. Where should I put it? I suppose that inside 'utils' folder I could create an 'api' folder and place there a file like 'test_api.py' or something like this.... is that ok?

ghost commented 12 years ago

http://docs.djangoproject.com/en/dev/topics/testing/#writing-unit-tests

For a given Django application, the test runner looks for unit tests in two places:

* The models.py file. The test runner looks for any subclass of unittest.TestCase in this module.
* A file called tests.py in the application directory -- i.e., the directory that holds models.py. Again, the test runner looks for any subclass of unittest.TestCase in this module.

ghost commented 12 years ago

please use the second option, i.e. a tests.py file this keeps the model file cleaner...

btw, django includes a JSON parser: from django.utils import simplejson

ghost commented 12 years ago

ok, i'll create the tests.py file in the root directory (where manage.py) for what i've seen until now, using the django framework for testing creates a tests db just for running the test (later it is destroyed). The problem I have is that this db is empty, so when I perform requests there are no api keys registered and no sounds/users/packs to retrieve... I'll investigate a bit more, I suppose that there should be a way to query the normal db, although it would be good to have a special db for testing so the content never changes and we can easily compare results... maybe a subset of our main db could be extracted to create a test db or something... anyway, i'll investigate and tell you about my findings!

ghost commented 12 years ago

sorry, the tests.py file goes in the app directory, not in the root ;) working on it...

ghost commented 12 years ago

ok, I've been reading forums and investigating a bit and it looks more complicated than it seemed.

Django testing framework doesn't allow to perform testing with the production db. It always creates an empty db (which later is destroyed) when performing tests.
The way django offers to load some data to this test database is by using fixtures (json serializations). Fixtures can be extracted from the production db with the command manage.py dumpdata. The problem is that there is no way to extract only a subset of the db, and trying to create a fixture of all is crazy (runs out of memory, it would take ages to load it in each test...)
So, to have a test db we have to do it manually. Again, the problem is that we cannot use freesound interface because not everything is implemented (so we cannot upload, nor tag sounds...). The only option is to directly edit the db with the django /admin/ page... this is quite a lot of work plus then we will have trouble with solr because is doesn't have "invented" sounds information right?

Possible solutions:

1. Fill a db with the django admin and export it as a fixture (still the solr thing I don't know what would happen). long work...
1. Do not use the django testing environment and create an "external" script that performs a request, gets a response and analyzes the content. In that case the content analysis should be somehow independent of the content of the db (checking the status code, returned fields and things like that, but not actual data...). This script could use twill or even the same testing tools offered by django (http://docs.djangoproject.com/en/dev/topics/testing/#testing-tools).
1. Come up with a way to intelligently retrieve a subset of the production db (without missed references and other unthinkable weird things...)... I don't know how to do that...

the second option now it is not that bad because api is read-only, but at the time we implement uploading functionality and we want to test, we will need the separate testing db in order not to write on the production db...

ideas? should I send this to the mailing list?

ghost commented 12 years ago

I think a stable test-DB is not such a bad idea to be honest... It would also facilitate testing and installation on different systems while still having some data to test with. The problem is of course finding -as you say- a good test-set.

Perhaps: almost all "models" / "database tables" have a "created" date. The easiest would be to truncate the database to "1 year of freesound" and use that? Then again this would probably result in an uneven distribution of used features. There might not be a lot of tags / ...

Actually, I think unit testing for anything is problematic in case of not having a fixed test DB. I'll ask Gerard to have a look at this too, perhaps he has some more ideas!

ghost commented 12 years ago

( By the way, I think it would be good to have both tests that depend and do not depend on content - like you say too... )

ghost commented 12 years ago

Well I'm absolutely no expert, but for me the point of unit tests is that the test is isolated from the rest of functionality, and you can perfectly predict what the outcome will be. I would say that using production data is actually another kind of test, so using fixtures with artificial data taylored for each test wouldn't be necessarily bad ...

ghost commented 12 years ago

when I mean testing db I mean a fixutre with some data to load at the moment of the test. What I can do for the moment is to artificially create a fixture with something like 10 users, 40 sounds (so we test paginated results), 10 packs, some tags... this could be rather easy (now I have one fixture with 1 user and 1 sound ;) and I could test API basic functionalities... however, if there was a way to gather a bigger fixture with real data would be much better. Anyway, the problem with the solr still would be the same, because there would be no sounds indexed and search couldn't be performed. What could we do to solve that? should we have a specific solr for testing purposes (I mean another installation with indexed data from the testing fixture)?

The workflow it comes to my mind to create this framework is:

Create a dump of the production freesound database with information from the fist year (for example). I don't really know how to do that but bram is right in that most of the models have a "created" field thus there should be a way to filter...
Run freesound local installation with this smaller db and another solr installation (easy)
Add all sounds to solr (easy)
Dump db contents into a json fixture with the commant "python manage.py dumpdata" (easy)

Then, to perform testing we would only need to create a script for running "python manage.py test" with the testing solar installation and loading the created fixture (the initial 1 year dump would only be needed to create the fixture and index files in testing solr)

uau, not sure this would work but it is the only way I can think of to have comprehensive testing...

ghost commented 12 years ago

As I understand a unit test, you could generate the test data specifically for each api call you want to test, creating extreme values, empty fields ... any data you suspect that could make it fail, in a systematic way. The same code could add the data to solr, then make the test and then delete everything. It may seem like some of work but actually once you have the framework it's probably much faster and easier than dealing with production data. If we find a problem with valid production data that was missed by the test, then it's a bug in the unit test...

ghost commented 12 years ago

Frederic: is setUp called after or before the syncing of the fixtures? if it is called after, you can easily do addAllSoundsToSolr in there and remove them all after the test has run ( ? )

let me ask on the django mailing list...

ghost commented 12 years ago

https://groups.google.com/forum/#!topic/django-users/SagbCc0rcXs

ghost commented 12 years ago

i think it depends if you want particular fixtures for every test case or not. I accidentally found out that the created test db automatically loads the fixtures it finds on a file called initial_data.json in the root directory. I changed this file (which was already existing and contains the sound licenses) for another one with also one user, one sound and some tags (just for trying out things, I'm not going to really change this file) and I looks like it is automatically loaded right after the creation of the test db (so before setUp)

anyway, it seems a good idea but maybe we will have trouble with the already existing indexed sounds of the production db. to explain better: when I test a search request with my empty test db, I do not obtain an empty results search but a status code 500. I think this might be because solr delivers some results among the sounds it has already indexed but they cannot be found (of course) in the testing db... so we need solr to only have indexed sounds from the testing db...

ghost commented 12 years ago

hey, yes, the licenses are in there. Fixtures can be used for two things: just loading data in your DB from the start ("constants" of the database) or for testing.

please don't add anything to the current fixtures. don't confuse one thing with the other!

did you see if loading TEST-fixtures is done before or after setUp?

if it is this is what a test should look like:


class test (...):
  fixtures = ["your", "test", "fixtures"]
  def setUp:
    clear Solr index!
    add all sounds loaded from fixtures to Solr index
  def testSearch:
    ...

Deleting all documents indexed in solr can be done in two ways:

http://borort.wordpress.com/2008/07/13/selectdelete-all-items-in-solr/

The idea of unit tests is that you start from zero every time. Right now you cleared out your database, but you forgot to clear out your Solr index!

ghost commented 12 years ago

ok, I finally managed to build a fixture of a real sample of freesound that works just fine (for the moment...) I did all the steps I proposed some comments ago, used the created fields to delete rows on the db after a certain date. had some trouble with foregin key constraints but I managed to obtain at the end a consistent small version of the db (or at least I haven't still found missing references, etc...). I was going to do freesound-1-year, but I finally did freesound-1-month (it already has 502 sounds, 323 users, downloads, comments, tags...). The only thing missing are geotags (there were no geotags at that time!) I could artificially create some. I also created an api key. If someone is interested, I'll put my bad and very unorganized sql code below (i had to learn sql...)

Then I ran the local freesound with that installation and dumped the data into a json fixture. the fixture is 2.7mb. Now when I run the test it loads this fixture from /api/fixtures (along with the initial_data.json in the root which I left unchanged) and I can perform test with controlled data. It takes 15 seconds to load the fixture, but I think we can afford that...

what is still missing is the indexing in solar. i'm quite sure it can be done on the setUp() but have to try. however, If I clear solr index I will lose the sounds from my normal local installation of freesound no? and it takes a lot of time to reload them again... I'll inform...

now the very-bad sql code:


1. Filter downloads table
DELETE FROM sounds_download WHERE created>'2005-04-9 00:00:00+01'
2. Filter sound sources
 SELECT MAX(sounds_sound.id) FROM sounds_sound WHERE created<'2005-04-9 00:00:00+01' ORDER BY id -> to obtain the highest id of  a sound created before the date (534)
DELETE FROM sounds_sound_sources WHERE from_sound_id >534 OR to_sound_id>534
3. Filter comments
DELETE FROM comments_comment WHERE created>'2005-04-9 00:00:00+01'
5. Filter messages
ALTER TABLE messages_message DROP CONSTRAINT messages_message_body_id_fkey
DELETE FROM messages_message WHERE created>'2005-04-9 00:00:00+01'
DELETE FROM messages_messagebody WHERE id>(SELECT MAX(messages_message.body_id) FROM messages_message WHERE created<'2005-04-9 00:00:00+01')
6. Filter forum things
ALTER TABLE forum_thread DROP CONSTRAINT last_post_id_refs_id_1acae61f
ALTER TABLE forum_forum DROP CONSTRAINT last_post_id_refs_id_ab18713b
DELETE FROM forum_post WHERE created > '2005-04-9 00:00:00+01'
DELETE FROM forum_thread WHERE created > '2005-04-9 00:00:00+01'
UPDATE forum_thread SET last_post_id=NULL WHERE last_post_id NOT IN (SELECT forum_post.id FROM forum_post WHERE forum_post.created < '2005-04-9 00:00:00+01')
UPDATE forum_forum SET last_post_id=NULL WHERE last_post_id NOT IN (SELECT forum_post.id FROM forum_post WHERE forum_post.created < '2005-04-9 00:00:00+01')
7. Filter user things
DELETE FROM accounts_profile WHERE accounts_profile.user_id IN (SELECT auth_user.id FROM auth_user WHERE auth_user.DATE_joined > '2005-04-9 00:00:00+01')
ALTER TABLE geotags_geotag DROP CONSTRAINT geotags_geotag_user_id_fkey
ALTER TABLE api_apikey DROP CONSTRAINT api_apikey_user_id_fkey
ALTER TABLE ratings_rating DROP CONSTRAINT ratings_rating_user_id_fkey
ALTER TABLE sounds_pack DROP CONSTRAINT sounds_pack_user_id_fkey
ALTER TABLE sounds_sound DROP CONSTRAINT sounds_sound_user_id_fkey
ALTER TABLE tags_taggeditem DROP CONSTRAINT tags_taggeditem_user_id_fkey
DELETE FROM auth_user WHERE auth_user.date_joined > '2005-04-9 00:00:00+01'
8. Filter ratings
DELETE FROM ratings_rating WHERE created > '2005-04-9 00:00:00+01'
9.Filter geotags
ALTER TABLE sounds_sound DROP CONSTRAINT sounds_sound_geotag_id_fkey;
DELETE FROM geotags_geotag WHERE created > '2005-04-9 00:00:00+01'
10. Filter packs
ALTER TABLE sounds_sound DROP CONSTRAINT pack_id_refs_id_ede47a0a;
DELETE FROM sounds_pack WHERE created >  '2005-04-9 00:00:00+01'
11. Filter taggeditem
DELETE FROM tags_taggeditem WHERE created > '2005-04-9 00:00:00+01'
12. Filter sounds
DELETE FROM sounds_sound WHERE created > '2005-04-9 00:00:00+01'
13.Fix missing references from sounds to packs and geotags that have been deleted
UPDATE sounds_sound SET pack_id = NULL WHERE pack_id NOT IN (SELECT id FROM sounds_pack)
UPDATE sounds_sound SET geotag_id = NULL WHERE geotag_id NOT IN (SELECT id FROM geotags_geotag)
14. Filter unused tags
DELETE FROM tags_tag WHERE id NOT IN (SELECT tag_id FROM tags_taggeditem)

ghost commented 12 years ago

Hey Frederic, that all sounds pretty good to me! I guess you could also remove the users who don't have any sounds attached to them?

I think we should add the fixture to the git repository, however, make sure it does not contain any sensitive data. I.e. reset the paqssword for all those users to a certain password (let's say... "password") and change all their email addresses to something nonexisting ("longrandomstring@example.com") before you add it to the repository!

The SQL should perhaps also go insode your test directory with some extra information about what it does. The good thing about having this test data is that we can just keep adding or removing data from the fixture instead of having to rebuild that "small atabase" each and every time.

For Solr, perhaps the easiest way would be to create a new empty index? I'm not sure how you run multiple indices in Solr... Otherwise, yes, you will lose your index.

ghost commented 12 years ago

Hey again, I've been looking around how to set up multiple indexes in solr. Although It is possible, it looks much more complicated than just running another solr for testing (at least in my local setup, I only have to duplicate the "example" folder which has solr). So waht I did was a script that runs the alternate solr and "python manage.py test api" and that is all. After the test db is created and the fixture is loaded, solr index is cleared and all sounds in the fixture are reindexed. In takes less than 10 seconds and if we change something in the fixture we don't have to worry for the solr index because it's updated at the beginning... the only problem I see with that way of working is that If I forgot the original solr opened when I run the test, It will erase all indexed sounds (from the production db) so the I'll have to reset it... One option is to run testing-solr in a different port (so we prevent possible loose of indexed data in production-solr), but then I should modify the "add_all_sounds_to_solr()" function to allow passing the port as a variable (and by default using settings.SOLR_URL). Does that sound good?

I know the best option would probably be to dig more in how to run multiple solr indexes in parallel... but at least for the moment I dont think its a bad solution...

ghost commented 12 years ago

I was talking with Frederic about this. The default behaviour of the test must be non-destructive, just to be sure that we won't delete the indexes if we run the tests on real Freesound.org.

Using another port sounds very sensible.

ghost commented 12 years ago

I added a line at the beginning of the test to change the settings.SOLR_URL to the testing solr url (which for me is 'http://localhost:8984/solr/'). That way clearing and indexing of sounds is done on 8984 (also the searching during the test) and we cannot erase indexed information in the "production-solr" because it works on a different port.

ghost commented 12 years ago

that sounds like a great solution!

ghost commented 12 years ago

Guys, please see this page on the wiki for extracting fixtures from the database:

http://www.assembla.com/wiki/show/freesound/makefixtures_module

With all the changes of the hack weekend the unit tests are more current than ever, what's the status on these?

ghost commented 12 years ago

I created a new fixture for testing consistent with the last changes in freesound and the db. For the moment I have this fixture and a very simple api/tests.py that checks that the response code of a couple of calls is OK.

I haven't commited anything because I don't know if we can upload the json fixture with info from users. Actually, all the mails are set to "anon@xmpl.org" and first_name/last_name fields to "", so it is somehow "anonymized". however, there are the comments, messages and forum posts... I guess I could create another fixture excluding this information which is not used right now in the api (at least messages and forum posts), but maybe sound comments could be retrieved through the api in a future... what do you think?

Once I have this new fixture I will commit. Aside from the fixture, to perform tests we also need another solr instance (as explained some comments above). In my case the installation was pretty easy, I just had to duplicate the "example" folder inside solr and configure it to run on port 8984 (instead of 8983). then I have a script that opens this solr instance and runs "manage.py test api". the test itself takes care of clearing the solr index and adding all sonds from the fixture each time it is executed.

So that's it, with the fixture and the "testing solr" it is easy to create new tests...

ghost commented 12 years ago

Comments etc are also visible in the public views.

The only thing NOT visible in the public views are the private messages. Are there private messages in your fixtures as well?

Anonimised email > good Invisible first and last names > good

MTG / freesound

Create simple unit tests for API #70