chanzuckerberg / idseq-workflows

Portable WDL workflows for IDseq production pipelines
https://idseq.net/
MIT License
31 stars 12 forks source link

Phylo Tree #123

Closed morsecodist closed 3 years ago

morsecodist commented 3 years ago

OK this is the big one. The test files make it look bigger than it is though. This updates the phylo tree algorithm and brings it in line with what the front end expects. It also has a test with realistic data and assertions for each output file. Ignore the failing consensus genome test. This PR doesn't touch that and it was branched before my selective github actions PR.

Advised approach to review:

I would start by looking at the main wdl (which is collapsed unfortunately): phylotree-ng/run.wdl. This will give you an overview of what is going on.

phylotree-ng/test/test_wdl.py will give you a good indication of the functionality and expected behavior.

From there you can look at the step implementations in phylotree-ng/python_steps/*

Then to wrap up it may be helpful to look in the phylotree-ng/Dockerfile

Ignore all of the test data in phylotree-ng/test/full_zika_test_data/*

morsecodist commented 3 years ago

@katrinakalantar re: expected values in unit tests. This would be great. I had considered even asserting that the files were exactly the same but I think meaningful assertions would be better for readers of the tests. Also What do you use to parse newick files? I could do that to make better assertions. Let me know of any assertions you think of and I can include them.

katrinakalantar commented 3 years ago

@katrinakalantar re: expected values in unit tests. This would be great. I had considered even asserting that the files were exactly the same but I think meaningful assertions would be better for readers of the tests. Also What do you use to parse newick files? I could do that to make better assertions. Let me know of any assertions you think of and I can include them.

Biopython has some newick parsing functionality (https://biopython.org/docs/1.75/api/Bio.Phylo.NewickIO.html) which could work well for this. I think we can hold off on adding these tests with this PR and I can get the assertions / values for the test files verified so that we can expand the tests in a subsequent PR.