castcollab / tesserae2

Tesserae2: Fast recombination-aware global and local alignment.
Other
3 stars 0 forks source link

Added CLI component to tesserae. #6

Closed jonn-smith closed 4 years ago

jonn-smith commented 4 years ago

Fixes #2

jonn-smith commented 4 years ago

@winni2k @kvg - OK, here's my CLI update for Tesserae.

I haven't written in python like this in a while, so I have probably done some things that aren't very "pythonic". I tried to include tests for everything I added, and I prefer to be verbose with comments and variable names for posterity (and so I don't get confused when I look at my code again in the future).

I updated the unit tests to reuse the same code against another interface I made to the Tesserae object, and I added in a few minor things to it as well.

This CLI will produce a .bam file agains two given .fasta/.fastq files which contains the partial alignments from each target relative to the master query sequence. It closely relates to a use case I have for the CAR-T tool. Right now it does not perform the alignment with multiple "reference" sequences - just the first one in the "query" fasta/q file.

jonn-smith commented 4 years ago

I also squashed all my commits together into one master commit for ease of merging / reviewing. This is what I'm used to doing in the GATK repo, but let me know if we want to preserve the complete history and I'll leave the details in next time.

winni2k commented 4 years ago

I also squashed all my commits together into one master commit for ease of merging / reviewing. This is what I'm used to doing in the GATK repo, but let me know if we want to preserve the complete history and I'll leave the details in next time.

I am a fan of squashing commits for a feature and rebasing them onto master. That keeps things pretty and easy to understand.

jonn-smith commented 4 years ago

@winni2k OK - my updates from round 1 are in!

winni2k commented 4 years ago

Some fun nighttime reading while I feed the baby.

winni2k commented 4 years ago

We are getting closer!

This is going to be a nightmare to merge, isn't it...

winni2k commented 4 years ago

I think we are at a place to rebase, squash, and merge this PR (with or without the changes from jts_CLI_2_wwk). @jonn-smith, is there anything I can do to help things along?

jonn-smith commented 4 years ago

@winni2k OK - I've rebased on master so it will merge cleanly. I also moved over the logging and capfd changes you made in the jts_CLI_2_wwk branch.

I purposely did not include the more substantive changes to the CLI test to read the files into memory. As time goes on we will undoubtedly start using more complex BAM files for testing and we shouldn't be in the habit of reading them into memory to do the comparison - for large scale tests it will be intractable.

Instead I created a method to compare two BAM/SAM files on disk, record by record. I think this will be better in the long run and is usable now.

Let me know what you think.

winni2k commented 4 years ago

This all looks really good.

Thanks also for fixing the pep8 issues in tesserae.py. I was hoping I could convince @kvg to do it, but I guess he lucked out this time.