getodk / collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨
https://docs.getodk.org/collect-intro
Other
715 stars 1.37k forks source link

Rework `FormDownloader` #3538

Closed seadowg closed 3 years ago

seadowg commented 4 years ago

We recently added tests to the FormDownloader (#3537) and doing so revealved some awkwardness in it's design. I think we should look at using the things we don't like about the tests to drive out a refactor here:

  1. This object shoudn't require Robolectric to be tested. To remove it we'd most likely need to remove/invert it's dependencies on the Collect.getInstance() singlton.
  2. We shouldn't need to mock a method (using spy) on the object under test. We might need to break out another object or be passing a mock/fake of one of the object's dependencies.
breakbusyloop commented 4 years ago

I like to follow this rough pattern for refactoring code to expand test coverage:

  1. Split out the obvious 'helper' snippets of code, especially code that is free of side effects but is adjacent to code with side effects. E.g. downloadXform needs helpers for sanitizeFormName (the regex), generateFilenameFromFormName (with file I/O, but still testable), and proposeFilename (helper to generateFilenameFromFormName, but without I/O).
  2. Separate the decisions from the actions. Pull out fragments of code that examine variables to make a decision about what action to perform. Write tests for the decision logic. E.g. downloadManifestAndMediaFiles should be split to have the parsing logic isolated from the download action so we can test parsing support for a malformed manifest document.
  3. Locate dependencies that can be passed in instead of accessed as global singletons or member variables representing complex objects. Hoist up those dependencies to make them the responsibility of the caller. E.g. installEverything could take some kind of simple wrapper Resolver class that handles the calls to findExistingOrCreateNewUri and Collect.getInstance().getContentResolver().delete. Under test we would call installEverything with a test double of Resolver that would not require access to singletons.
  4. Relocate code to a separate file if it is not closely related to the rest of the logic. E.g. downloadFile is the only part of FormDownloader.java that operates on byte[], file streams and direct access to the HTTP input stream.

This process concentrates the integration points to a few methods. We would still need mock testing to handle HTTP requests, form DAO method calls, singleton access, etc. But it keeps our integration tests isolated to just the code that has integration points, while unit tests cover the rest of the code. Eventually we could refactor to inject all dependencies, using test doubles to provide implementations that don't access singletons or perform side effects. That final stage is much more work than the other stages and provides less value, so it is likely less important than continuing the steps above for other files in the code base.

breakbusyloop commented 4 years ago

Here's a great book of techniques for reworking legacy code: https://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers

There are some PDFs of the book floating around

lognaturel commented 4 years ago

Thanks for the great writeups, @seadowg, @OpenDataNerd!

I have read Working Effectively with Legacy Code and referred back to it. I agree it has a lot of great strategies. I have wondered what Feathers would say about the Android framework itself and its particular hostility to separation of concerns.

There are some PDFs of the book floating around

Since this is a copyrighted work and something that the author makes a living with, I think it's the right thing to do to obtain a copy by legal means! 😊

seadowg commented 4 years ago

I have wondered what Feathers would say about the Android framework itself

Nothing good!

breakbusyloop commented 4 years ago

I'm curious to learn what the conflict is. Some glaring big issues or a thousand small ones?

lognaturel commented 4 years ago

The code style that you see in Collect with an Application singleton and God Activity classes that manage a ton of state was all that the Android team showed in their samples for a long time. Clearly that's what the framework designers had in mind initially and that structure doesn't lend itself well to any kind of isolation testing. The framework itself has a lot of singletons and classes full of static methods. And inherently there are a lot of dependencies to manage in a mobile context (databases, network, services, etc).

I think the biggest challenge has been that a lot of what an app does is driven by the Activity lifecycle and that introducing abstractions that are not lifecycle-aware leads to a lot of potential problems. So that's part of the reason that Activity (and later Fragment) classes just grew and grew.

Of course, devs have always had the option of introducing their own abstractions and decoupling from the framework but without good examples, I don't think it was very common for a while. The past handful of years have led to more concrete examples of SOLID-type codebases and idioms for separating concerns and that has helped a lot, I think. But it's still been pretty chaotic. From what I saw/understood, the Android community was experimenting with model-view-presenter and model-view-viewmodel patterns and probably rallying more behind MVP when Android came out with their "architecture components" which push something like MVVM and reactive patterns. That includes ViewModel which is lifecycle-aware and Lifecycle which makes it possible to write lifecycle-aware components.

lognaturel commented 4 years ago

Oh, I forgot another big one. The Android framework calls the default, no-args constructor on Activity and Fragment creation. That means constructor dependency injection is impossible and dependencies are typically built in onCreate. Alternately, folks use dependency injection frameworks like Dagger which you'll see some tentative use of in the code base.

lognaturel commented 4 years ago

Unfortunately we've had to make some more changes to FormDowloader in #3639 so the pain has been felt again. To recap how we ended up in this code that had been mostly untouched since the beginning of time:

  1. We wanted to pull out the XPath path of the first geopoint not in repeat in support of the instance mapping feature.
  2. We decided to replace custom XML parsing code with JavaRosa for this because we would have had to duplicate a lot of its parsing code otherwise to identify whether a point is in a repeat.
  3. Building a JavaRosa FormDef requires access to all external secondary instances. This is not great but we haven't yet figured out an alternative (https://github.com/opendatakit/javarosa/issues/444).
  4. Getting access to those external secondary instances requires configuring ReferenceManager to resolve jr:// URIs.
  5. 3523 did this but it tried to do it with a directory that didn't exist for forms without media so it broke downloads of forms without media.

  6. 3537 only set up the ReferenceManager when media was downloaded with a form. However, it had two problems. First, form definitions with last-saved references implicitly use an external secondary instance that's not downloaded with the form. Second, it turns out the ReferenceManager was not correctly set because it was still trying to use the ODK root rather than the temporary directory.

  7. 3639 hopefully addresses those two issues.

As I've fixed those issues, I've added tests to document the issues and hopefully provide some coverage for future work there. But as @seadowg notes in #3639:

I think it's good to have tests around this stuff obivously but I'm worried if we keep going like this we're really just locking in the bad design and making harder to unpick.

That's absolutely fair. I'm of two minds about this. On one hand, the "design" of FormDownloader leaves a lot to be desired. On the other, it works and I'm not sure that a better design or more tests would have helped with the issues we've run into. I think the problems we've run into are that ReferenceManager is really confusing (how do root session translators and reference factories interact, why are we even using all these different prefixes if we just map them to the same place, etc) and that the form download and form push to device code paths are subtly different. In particular, the fact that the download path uses a temporary directory makes things quite different when it comes to dealing with media. These are conceptual issues. I didn't have to fight the FormDownloader at all when making fixes.

All this to say that I'm not convinced that investing more in FormDownloader is the right response in the short term. I'll spend a little more time seeing what we could do safely to remove the reliance on the mocked methods in the existing tests but don't want to take on a lot of risk.

The tests currently call FormDownloader.downloadForms and verify the messages that come out which is what actual clients do. Currently, downloadForms calls a bunch of chained methods to go through the whole form download pipeline with each form. This includes the XForm download, the manifest download, the download of any media attachments to a temp directory, parsing the XForm, inserting the form details into the database, moving temp files, etc. Some of the tests don't need to or can't go through that whole pipeline so those are the methods that are mocked. My thinking is that if we do split some of that functionality out, the methods of FormDownloader that are currently mocked can be replaced by calls on mocked objects. Better code structure but I actually think the tests will read about the same.