IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
877 stars 484 forks source link

Running integration tests for more than just API #5068

Closed poikilotherm closed 9 months ago

poikilotherm commented 6 years ago

Dear devs,

while looking into #4690, I noticed that there are no unit or integration tests present for the S3 file storage.

I would feel very uncomfortable to change the existing driver without having a proper test to check if things don't break. After looking up some issues about integration tests and running them (#4726, #4773 , #4960, #4896) I was wondering if I should start working on this first, before trying to fix #4690.

I have some experience with this from a stale project of mine (www.github.com/peerpub/peerpub).

As #5062 and #5065 popped up, this seems to be a good opportunity to get this to a more advanced level...

(Oh and I forgot: there is a TODO left in the pom.xml for running this stuff...!)

What do you guys think?

pdurbin commented 6 years ago

@poikilotherm hi! We are definitely interested in more automated testing in general.

The terms "unit tests" and "integration tests" are not always well defined so I hope you've already looked at how we are defining them at http://guides.dataverse.org/en/4.9.2/developers/testing.html

If you can add unit tests for S3, I'd say you should go ahead. Integration tests would be great but we'd need to think about having an environment available that's configured with S3 in order to test. The main test environment I look after, the "phoenix" server described in the "testing" page of the dev guide above, uses a regular filesystem rather than S3 for storage.

PeerPub looks interesting. The "coverage report" and "pipeline status" links are intriguing but a login is required.

poikilotherm commented 6 years ago

Hi @pdurbin! :smiley:

Don't be fooled, PeerPub never reached a working status... Maybe some day I'll find more people to work with on this. The reports are generated on a private GitLab CI, maybe I'll update this to Travis to make it public... For now, it is a potential source of knowledge as it was started with JUnit5 plus OpenClover right away... :wink:

About S3 integration tests: in my little world, I would just create a Travis Job that runs the integration tests later for different StorageIO drivers anyway. To realize that I would use different type of JUnit profiles enabled via Maven calls.

Of course I opt not to use the Surefire Plugin, but the Failsafe one (which was designed with IT tests in mind) for those.

Sounds reasonable?

pdurbin commented 6 years ago

@poikilotherm every morning at standup we gather together and talk about pull requests on our kanban board at https://waffle.io/IQSS/dataverse and see red and green from Travis CI, like this:

screen shot 2018-09-19 at 8 14 39 am

Every once in a while someone will ask, "Should I be concerned that this pull request is red?" Unfortunately, the answer right now is often "no" because we haven't invested much time in continuous integration.

Sometimes the pull request is red because code coverage is down slightly, such as -0.006% in for pull request #4998 in the screenshot below:

screen shot 2018-09-19 at 8 19 13 am

Sometimes the pull request is red because the Maven repo at http://maven.geotoolkit.org is down. Here's a screenshot from https://travis-ci.org/IQSS/dataverse/builds/429095040 for pull request #5047:

screen shot 2018-09-19 at 8 21 26 am

Right now Travis is only running the unit tests. Travis does not report on our integration tests at all. If you can help us with any of the above it would be greatly appreciated! We are definitely willing to switch to a different Maven plugin or add different or better tools.

poikilotherm commented 6 years ago

Hi @pdurbin and @pameyer,

re-reading the definition of your integration tests actually lets me scratch my head. Don't get me wrong, your definition is fine, but reading the "further action" part on the bottom of the page actually shouts at me "we want to do integration testing" which for now would be excluded by your definition as it is solely used for "API Near-End-to-End testing"...

Anyway - there is a lot of stuff testable in between, like my S3 efforts (#4690) and more. The procedure to run the integration (near-end-to-end?) tests is also quite heavily lifting with docker and everything.

Did you guys look into mvn verify and maven-failsafe in the past? Are you aware of the possibilities to have embedded Postgres and Solr (while working on PeerPub I was using an embedded MongoDB for running the IT tests)? This could be much faster for all of the integration tests (which should ideally be runnable not only on CI but also on a dev laptop) and quicker to use (just call mvn verify).

Also the documention is not stating how to update the WAR file within the container after a rebuild for re-running the IT-Tests...?

pdurbin commented 6 years ago

@poikilotherm this is exactly why I don't use Docker for day to day development. Our current process is to delete the container and to build it all over again. I'm working away on pull request #5063 right now and @matthew-a-dunlap and I have talked about making it easier to deploy new WARs ("latest code from same branch" under "phase 3"): https://github.com/IQSS/dataverse/issues/4990#issuecomment-422464237 . The context is spinning up arbitrary branches on AWS but hopefully automating this could be applied in various places. In practice I use asadmin deploy --force path/to/dataverse.war a lot, if that helps.

I haven't heard of maven-failsafe. I like what you were saying earlier about how it was designed with integrations tests in mind.

I'm aware that there's some sort of embedded Glassfish but I've never gotten it to work. It stands to reason that there would be embedded Postgres and Solr as well. Please keep the ideas coming!

pameyer commented 6 years ago

Hi @poikilotherm, My unsophisticated interpretation of "integration tests" is "stuff that needs a functional set of servers to execute". With the current setup, this doesn't require docker, although that's the only approach I personally was able to get working - IQSS jenkins, and I believe most of the IQSS dev team, use different approaches (and ones that appeared to me to require more manual stages).

I've used mvn verify before, but not for integration tests. Updating a WAR inside a running container is do-able and would be faster; but also increases the probability of mismatch between code in the repo and deployed code (not a reason not to do it, but a problem that I've hit before). Think it's worth doing?

poikilotherm commented 6 years ago

While working on #5072, #4690 and #5059, this seems to become more crucial.

I can write unit tests for S3, but only mocked. To really test things, I need running stuff. This is always heavy lifting in contrast to unit tests, so it would be a good idea to hand it over to integration testing.

I would go on and try to come up with something, ok?

pdurbin commented 6 years ago

@poikilotherm sure, we very much welcome ideas about how to improve our automated testing. We know we have a long way to go. Thanks!

poikilotherm commented 6 years ago

While working on PeerPub, I was using Spring Boot, which made things easy. With dataverse, we have to deal with EJB, more dependencies, services, etc.

I saw on IRC (and other places?) that you struggle from non-clean test environments. If you guys can agree using Docker for integration testing, maybe using Arquillian plus Arquillian Cube (and maybe Arquillian APE including DBUnit and Arquillian Suite Extension) can help a lot with this...

With Arquillian you can control the containers hole lifecycle right from within the test and/or config, which is a great benefit for integration testing. Even testing with different Servlet Containers is easy that way - this may be of help for procedding with #4172 and others.

Of course Arquillian can be used WITHOUT the Cube and Docker dependency. But as you rely on Postgres + Solr + R + ..., it might be much easier to use separate containers than trying to embed all this stuff or using the AIO one.

IMHO this sound like some very interesting technology which fits in your current approach. If there is no one shouting "NOOOOOO" I would give this a try.

pdurbin commented 6 years ago

@poikilotherm the technology choices sound fancy and fine but should we back up a level and define a little more what we want? I could start a Google doc for this if you want.

poikilotherm commented 6 years ago

Uhm... Maybe? Unsure. Is this a topic that needs to be discussed @IQSS more broadly? If it helps to get people into the same boat and run smoothly, go ahead. I'm just a guy from overseas :wink:

pdurbin commented 9 months ago

Closing as a duplicate of this issue: