ARUP-NGS / BMFtools

Barcoded Molecular Families
MIT License
22 stars 8 forks source link

Dev #15

Closed dnbaker closed 9 years ago

dnbaker commented 9 years ago

Parallelized bmftools dmp, refactoring, re-wrote the inliners. new BAM sort, new piped analysis from fastq stage to flattened/mpa'd/sorted bam.

dnbaker commented 9 years ago

I think that, before this gets merged in, we need a few more things.

  1. Metal:
    1. bmftools dmp re-working Estimate: 2-3 days
      1. Finish the Fisher Flattening. The new merging style should happen before a new "release" is made. E
      2. GATK Compatibility - provide options for mean quality score or max quality score for merged quality rather than the Fisher work.
    2. Finish the BAM rescue merging.
      1. 1-2 days
    3. Change the PV/FA/DR/DG tags to use array buffers, now that pysam has been expanded to be able to work with binary array tags.[COMPLETED]
  2. Functionality:
    1. bmftools dmp subcommand
    2. bmftools fully-piped dmp fastq's to tagged/mpa'd bams subcommand. (Code written, with a small bug fix in the works for non-HG37 datasets.) [COMPLETED]
      1. bug fix. (~2 hours)
  3. Testing:
    1. Bam tagging and mpa unit tests.
BrettKennedy commented 9 years ago

Sounds good to me. I'll work on the bam tagging unit test.

Brett

dnbaker commented 9 years ago

Before I forget, I think this is also the release that could use the global sample metrics and the LockedDictionary.

Here's the updated chart:

  1. Metal:
    1. bmftools dmp re-working Estimate: 2-3 days
      1. Finish the Fisher Flattening. The new merging style should happen before a new "release" is made. E
      2. GATK Compatibility - provide options for mean quality score or max quality score for merged quality rather than the Fisher work.
    2. Finish the BAM rescue merging.
      1. 1-2 days
    3. Change the PV/FA/DR/DG tags to use array buffers, now that pysam has been expanded to be able to work with binary array tags.[COMPLETED]
  2. Functionality:
    1. bmftools dmp subcommand
    2. bmftools fully-piped dmp fastq's to tagged/mpa'd bams subcommand. (Code written, with a small bug fix in the works for non-HG37 datasets.) [COMPLETED]
      1. bug fix. (~2 hours)
    3. Global reporting
      1. Store global QC metrics information in the SampleMetrics dictionary, inherited from LockedDictionary.
      2. Permit control over which files should be copied for a reviewdir with the ReviewDirComponents LockedDictionary.
  3. Testing:
    1. Bam tagging and mpa unit tests.
dnbaker commented 9 years ago

Here's the updated chart:

  1. Metal:
    1. bmftools dmp re-working Estimate: 2-3 days
      1. ~~Finish the Fisher Flattening. The new merging style should happen before a new "release" is made. ~~
      2. ~~GATK Compatibility - provide options for mean quality score or max quality score for merged quality rather than the Fisher work. ~~ You can now use the command-line option/config option for "cap". Setting this value to a character caps it at that character, an int as that int, and
    2. Finish the BAM rescue merging.
      1. 1-2 days
    3. Change the PV/FA/DR/DG tags to use array buffers, now that pysam has been expanded to be able to work with binary array tags.[COMPLETED]
  2. Functionality:
    1. bmftools dmp subcommand
    2. bmftools fully-piped dmp fastq's to tagged/mpa'd bams subcommand. (Code written, with a small bug fix in the works for non-HG37 datasets.) [COMPLETED]
      1. bug fix. (~2 hours)
    3. Global reporting
      1. Store global QC metrics information in the SampleMetrics dictionary, inherited from LockedDictionary.
      2. Permit control over which files should be copied for a reviewdir with the ReviewDirComponents LockedDictionary.
  3. Testing:
    1. Bam tagging and mpa unit tests.

So that leaves us with this.

Testing:

  1. MPA and fastq unit tests

Tabled:

  1. Quality score rescaling
  2. SampleMetrics/ReviewDir
  3. Bam rescue merging
dnbaker commented 9 years ago

Okay, I have to fix that overflow error for the large families. Whoops.