broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

Gsaksena mmd2 #67

Open gsaksena opened 6 years ago

gsaksena commented 6 years ago

Pull Request Template

Describe your changes here.

Style Checklist

Please ensure that your pull request meets the following standards for quality. Code should not be merged into the master branch until all of these criteria have been satisfied.

Comments

Style/Execution

gsaksena commented 6 years ago

Items in the flow chart that are not in the code: 1) doing the atomic file writes for the key metadata files, and renaming them to .error if needed. 2) adding download/dice date to metadata file.

gsaksena commented 6 years ago

I've checked in more code to address some of the comments.

I also fixed new issue #68, adding new MAF datatypes, and added a better error message to make adding new datatypes easier in the future

In addition, there is now some test coverage for the new features.

gsaksena commented 6 years ago

Issue#47 and issue#53 are being addressed in this branch

dheiman commented 6 years ago

Items in the flow chart that are not in the code:

  1. doing the atomic file writes for the key metadata files, and renaming them to .error if needed.
  2. adding download/dice date to metadata file.

Are these going to be put in the code?

gsaksena commented 6 years ago

Items in the flow chart that are not in the code:

  1. doing the atomic file writes for the key metadata files, and renaming them to .error if needed.
  2. adding download/dice date to metadata file.

Are these going to be put in the code?

No for #2. #1 is a separable feature that I'll get to or log as an issue depending on how other things go.

gsaksena commented 6 years ago

It would be better to file the .error renaming etc stuff as a new, separate issue. It has the potential to disrupt people's workflows.

noblem commented 6 years ago

Gentlemen, what's the status of this branch and PR? I still see a handful or 2 of "TODO" statements in the code, some of which have Gordon signature. I'm revisiting GDCtools because I'd like to add the counts data that it generates (after mirror,dicing,etc) into a database.

gsaksena commented 6 years ago

Last I knew it was ready to go but approval/merge was stalled due to gdac time/attention being shifted to the gtex fire. Gordon

On Jan 16, 2018 3:48 PM, "noblem" notifications@github.com wrote:

Gentlemen, what's the status of this branch and PR? I still see a handful or 2 of "TODO" statements in the code, some of which have Gordon signature. I'm revisiting GDCtools because I'd like to add the counts data that it generates (after mirror,dicing,etc) into a database.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gdctools/pull/67#issuecomment-358099843, or mute the thread https://github.com/notifications/unsubscribe-auth/AFK12f_H84JGjsssfi7Z4Y3CfDdFWfTmks5tLQsAgaJpZM4QJLVG .

noblem commented 6 years ago

I see one issue:

Test that dicing throws error when mirror is corrupted

gdc_mirror.py: error: argument --config: can't open 'corrupt_mirror.cfg': [Errno 2] No such file or directory: 'corrupt_mirror.cfg'

Looks like the new cfg file was not checked in to your branch?

noblem commented 6 years ago

While you're at it: the pool1.cfg file is also missing

noblem commented 6 years ago

Gordon, would you mind throwing those 2 files "over the fence" and into the repo? Thanks!

noblem commented 6 years ago

Alright, I've cleaned this up a wee bit and verified that the new tests indeed pass. As a final step I'd like to meet with you at 1pm tomorrow, David (as the primary reviewer) to discuss a few points before giving this a thumbs up for merging to master.

gsaksena commented 6 years ago

I have a meeting tomorrow at 1. I can meet at 12 or after 2.

On Wed, Jan 17, 2018 at 5:07 PM, noblem notifications@github.com wrote:

Alright, I've cleaned this up a wee bit and verified that the new tests indeed pass. As a final step I'd like to meet with you at 1pm tomorrow, David (as the primary reviewer) to discuss a few points before giving this a thumbs up for merging to master.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gdctools/pull/67#issuecomment-358464436, or mute the thread https://github.com/notifications/unsubscribe-auth/AFK12ZFG23Y7zgVUQt7fWdVmLVJsoI9eks5tLm8VgaJpZM4QJLVG .

noblem commented 6 years ago

Thanks Gordon, I chose the the 1pm time to fit with David's schedule ... I suspect we'll be fine if you cannot attend.

gsaksena commented 6 years ago

Downstream steps look only for the main output file, not .partial, .error, etc. This respects encapsulation. A .partial is currently being used just to ensure that downstream steps never get a file that has been control-C'ed in the middle of.

On Thu, Jan 18, 2018 at 3:41 PM, noblem notifications@github.com wrote:

@noblem commented on this pull request.

In gdctools/gdc_dice.py https://github.com/broadinstitute/gdctools/pull/67#discussion_r162463387 :

                              diced_project_root, mfw,

dry_run=self.options.dry_run, force=self.force)

  • prog_status_tally[dice_one_status] += 1
  • move completed metadata file to final location

  • note it may include diced files that had errors

  • os.rename(diced_meta_file_partial, diced_meta_file)

Thanks for explaining, but I don't understand the logic/idea: a .partial is supposed to indicate a failure state, and the absence of .partial represents a success state. But you seem to be saying the opposite. Downstream steps that observe upstream .partial (failure state) should fail immediately, no?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gdctools/pull/67#discussion_r162463387, or mute the thread https://github.com/notifications/unsubscribe-auth/AFK12Re7zHnngdbj0Ym1nRyFR_NCAFmnks5tL6yMgaJpZM4QJLVG .