Closed jonn-smith closed 4 years ago
@winni2k @kvg - OK, here's my CLI update for Tesserae.
I haven't written in python like this in a while, so I have probably done some things that aren't very "pythonic". I tried to include tests for everything I added, and I prefer to be verbose with comments and variable names for posterity (and so I don't get confused when I look at my code again in the future).
I updated the unit tests to reuse the same code against another interface I made to the Tesserae object, and I added in a few minor things to it as well.
This CLI will produce a .bam
file agains two given .fasta
/.fastq
files which contains the partial alignments from each target relative to the master query sequence. It closely relates to a use case I have for the CAR-T tool. Right now it does not perform the alignment with multiple "reference" sequences - just the first one in the "query" fasta/q file.
I also squashed all my commits together into one master commit for ease of merging / reviewing. This is what I'm used to doing in the GATK repo, but let me know if we want to preserve the complete history and I'll leave the details in next time.
I also squashed all my commits together into one master commit for ease of merging / reviewing. This is what I'm used to doing in the GATK repo, but let me know if we want to preserve the complete history and I'll leave the details in next time.
I am a fan of squashing commits for a feature and rebasing them onto master. That keeps things pretty and easy to understand.
@winni2k OK - my updates from round 1 are in!
Some fun nighttime reading while I feed the baby.
We are getting closer!
This is going to be a nightmare to merge, isn't it...
I think we are at a place to rebase, squash, and merge this PR (with or without the changes from jts_CLI_2_wwk
). @jonn-smith, is there anything I can do to help things along?
@winni2k OK - I've rebased on master
so it will merge cleanly. I also moved over the logging and capfd
changes you made in the jts_CLI_2_wwk
branch.
I purposely did not include the more substantive changes to the CLI test to read the files into memory. As time goes on we will undoubtedly start using more complex BAM files for testing and we shouldn't be in the habit of reading them into memory to do the comparison - for large scale tests it will be intractable.
Instead I created a method to compare two BAM/SAM files on disk, record by record. I think this will be better in the long run and is usable now.
Let me know what you think.
This all looks really good.
Thanks also for fixing the pep8 issues in tesserae.py
. I was hoping I could convince @kvg to do it, but I guess he lucked out this time.
Tesserae.align_from_fastx
to align data from FASTX files.Tesserae.align
with a dictionary to preserve sequence names.Tesserae.align
.CLI
.Fixes #2