Closed sooheelee closed 6 years ago
I've tentatively categorized the tools and they are listed in speadsheet format at:
* @author Valentin Ruano-Rubio <valentin@broadinstitute.org>
is placed at top of doc, causes javaDoc to not show. Such lines should be at the end of the javaDoc portion. @vdauwera prefers all author annotations be removed.gatk
to invoke the launch script, not gatk-launch
. Engine team tells me this change will be effective end of this month.-Xmx
to be defined and this should be reflected in the example command(s). Hopefully, if your tool needs it, you already know it. Otherwise, see https://github.com/broadinstitute/gatk/issues/3137.**AMENDED**
Documentation of Picard tools in the Best Practices are a priority as is categorization of Picard tools. In the forum tool list, Picard tools will be mixed with GATK tools alphabetically, with the PICARD label coming after the tool name.To view docs, build with ./gradlew clean gatkDoc
, then view local index in browser.
@vdauwera The tools are categorized and listed in the Google Spreadsheet above. It is waiting for you to assign tech leads to tools for documentation.
One thing that @chandrans brought to my attention is that for BaseRecalibrator one of the parameters (-bqsr
) actually causes an error. One can no longer generate the 2nd recalibration table with correction on the fly and instead must use the recalibrated BAM through BaseRecalibrator to generate the 2nd recal table for plotting. This type of information is missing from the tool docs. Furthermore, updates I made to the BQSR slidedeck (that showcase this -bqsr
parameter) are based on information from a developer and this information turns out to be incorrect now (perhaps correct at some point in development?). Soooo, I think it may be prudent that those responsible for tool docs test the commands on data.
Mon Nov 20 17:30:46 2017 -0500
where we upgraded htsjdk to 2.13.1:Download and load the index.html
into a web browser to click through the docs.
Geraldine says she is busy catching up this week so I think it best if tech leads assign the tools to members of their teams @droazen @cwhelan @samuelklee @ldgauthier @vruano @yfarjoun @LeeTL1220.
If we can agree on tool categorization sooner than later, this gives @cmnbroad time to engineer any changes that need engineering.
Any chance we could break off legacy CNV tools into their own group? There are many more of them than there will be in the new pipelines---and many of them are experimental, deprecated, unsupported, or for validation only---that I think it makes sense to hide them and perhaps be less stringent about their documentation requirements. Anything we can do to reduce the support burden before release would be great.
I just learned that KEBAB case is different from SNAKE case @cmnbroad. Sorry if KEBAB is offensive @cmnbroad but it is meant to clarify syntax (e.g. https://lodash.com/docs#kebabCase). To be clear, Geraldine wants KEBAB case that uses hyphens, and not SNAKE case, which uses underscores.
--emitRefConfidence
would become --emit-ref-confidence
. --contamination_fraction_to_filter
would become --contamination-fraction-to-filter
.@vruano will describe how he uses constants to manage parameters.
Since we are going to change many of those argument names (camel-back to kebab-case) I think we should take this opportunity to use constants to specify argument names in the code and use them in our test code so further changes in argument names don't break tests.
Take as an example CombineReadCounts. Extract enclosed below.
It might be also beneficial to add public constant for the default values.
public final class CombineReadCounts extends CommandLineProgram {
public static final String READ_COUNT_FILES_SHORT_NAME = StandardArgumentDefinitions.INPUT_SHORT_NAME;
public static final String READ_COUNT_FILES_FULL_NAME = StandardArgumentDefinitions.INPUT_LONG_NAME;
public static final String READ_COUNT_FILE_LIST_SHORT_NAME = "inputList";
public static final String READ_COUNT_FILE_LIST_FULL_NAME = READ_COUNT_FILE_LIST_SHORT_NAME;
public static final String MAX_GROUP_SIZE_SHORT_NAME = "MOF";
public static final String MAX_GROUP_SIZE_FULL_NAME = "maxOpenFiles";
public static final int DEFAULT_MAX_GROUP_SIZE = 100;
@Argument(
doc = "Coverage files to combine, they must contain all the targets in the input file (" +
TargetArgumentCollection.TARGET_FILE_LONG_NAME + ") and in the same order",
shortName = READ_COUNT_FILE_LIST_SHORT_NAME,
fullName = READ_COUNT_FILE_LIST_FULL_NAME,
optional = true
)
protected File coverageFileList;
@Argument(
doc = READ_COUNT_FILES_DOCUMENTATION,
shortName = READ_COUNT_FILES_SHORT_NAME,
fullName = READ_COUNT_FILES_FULL_NAME,
optional = true
)
protected List<File> coverageFiles = new ArrayList<>();
@Argument(
doc = "Maximum number of files to combine simultaneously.",
shortName = MAX_GROUP_SIZE_SHORT_NAME,
fullName = MAX_GROUP_SIZE_FULL_NAME,
optional = false
)
protected int maxMergeSize = DEFAULT_MAX_GROUP_SIZE;
@ArgumentCollection
protected TargetArgumentCollection targetArguments = new TargetArgumentCollection(() ->
composeAndCheckInputReadCountFiles(coverageFiles, coverageFileList).stream().findFirst().orElseGet(null));
@Argument(
doc = "Output file",
shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME,
fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME,
optional = false
)
protected File outputFile;
@samuelklee Because our repo is open-source, even if we hide them from the docs, users end up asking questions on them. So no to hiding any tool that is in the repo.
Even when we deprecate a tool or feature, we give people fair warning that the tool/feature will be deprecated before literally removing it from the codebase.
Besides the BETA
label, another option that will soon become available is the Experimental
label for internal tools. @cmnbroad is implementing now. It would be great to have additional categories, but @cmnbroad says that this is as much as he has time to do for us and perhaps this is for the best because we don't want to clutter our docs with too many labels. Perhaps @vdauwera can weigh in with thoughts and options.
Fair points. I agree that legacy tools/versions that are part of a canonical or relatively widely used pipeline should have good documentation.
However, there are many of the CNV tools that are basically prototypes---they have never been part of a pipeline, have no tutorial materials, and the chances that any external users have actually used them are probably extremely low. The sooner they are deprecated, the less the overall burden on both comms and methods---I don't think comms should need to feel protective of code or tools that developers are willing to scrap wholesale!
I'd like to cordon off or hide such tools so the program group doesn't get too cluttered---if we can do this in a way that doesn't require @cmnbroad to add more categories, that would be great. For example, we will have 5 tools that one might reasonably try to use for segmentation (PerformSegmentation, ModelSegments, PerformAlleleFractionSegmentation, PerformCopyRatioSegmentation, and PerformJointSegmentation). The first two are part of the legacy and new pipelines, respectively, but the last 3 were experimental prototypes. I think it's definitely confusing to have these 3 presented in the program group, and treating them the same as the other tools in terms of documentation is just extra work for everyone.
In any case, I definitely think an additional program group to separate the legacy and new tools is warranted, since many of the updated tools in the new pipeline have very similar names to the legacy tools. If this is OK with everyone, I'll just add a "LegacyCopyNumber" program group, which I don't think should require extra work on anyone else's part.
@samuelklee To add to @sooheelee's answer, if there are any tools that you definitely want gone and already have a replacement for, I would encourage you to kill them off (ie delete from the code) before the 4.0 launch. While we're still in beta we can remove anything at the drop of a hat. Once 4.0 is out, we'll have a deprecation policy (exact details TBD) that will allow us to prune unwanted tools over time, but it will be less trivial. And as Soo Hee said, everything that's in the current code release MUST be documented. We used to hide tools/docs in the past and it caused us more headaches than not.
That being said, as part of that TBD deprecation policy it will probably make sense to make a "Deprecated" program group where tools go to die. If there are tools you plan to kill but don't want to do it before 4.0 is released for whatever reason, you could put them there. Documentation standards can be less stringent for tools in that bucket. To be clear I think the deprecation group name should be generic, ie not named to match any particular use case or functionality. That will help us avoid seeing deprecation buckets proliferate for each variant class/ use case. Does that sound like a reasonable compromise?
We're not following an external spec doc, so here some guidelines to follow instead. Keep in mind that the main thing we're going for here is readability and consistency across tools, not absolute purity, so feel free to raise discussion on any cases where you feel the guidelines should be relaxed. Some things are more negotiable than others.
-
) as separator, no underscores (because lots of newbies struggle to differentiate the two, and underscores take more effort to type than dashes).--do-this-thing
rather than --dothisthing
(this is really important for readability, especially for non-native English speakers).--do-this-thing
rather than --dtt
--really-long-argument-names-that-take-up-half-a-line
, please reach out and ask for a consult; maybe we can find a more succinct way of expressing what you need.Sounds like a fantastic idea -- I encourage everyone to follow @vruano's lead on this one.
OK, great---I'll issue some PRs to delete some of the prototype tools soon and update the spreadsheet accordingly. A non-CNV-specific "Deprecated" program group seems reasonable to me if there is enough demand. If this is the only way to delineate the legacy CNV + ACNV pipeline from the new pipeline, I'm OK with it---but we should probably make the situation clear at any workshops, presentations, etc. between now and release that might focus on the legacy pipeline.
On a different note, are there any conventions for short names that we should follow?
I propose to still hide from the command line and docs the example walkers. They are meant only for developers, to show how to use some kind of walkers and have a running tool for integration tests. Having then in the command line will generate software users to run them instead of use them for developmental purposes...
In addition, I think that this is a good moment to also generate a sub-module structure (as I suggested in #3838) to separate artifact for different pipelines/framework bits (e.g., engine, Spark-engine, experimental, example-code, CNV pipeline, general-tools, etc.). For the aim of this issue, this will be useful for setting documentation guidelines in each of the sub-modules: e.g., example-code should be documented for developers, but not for the final user; experimental module should have the @Experimental
barclay annotation in every @DocumentedFeature
; etc.
A couple of comments:
To clarify the build process noted above "view local index in browser" means open the index.html file at gatk/build/docs/gatkdoc/
We need standard arguments to show up in the documentation for both Picard and GATK. Can @droazen or @cmnbroad please let me know this is happening?
The standard arguments for each tool are listed with that tool's arguments (if you look at the doc for a particular tool, you'll see an "Optional Common Arguments" heading, with the shared, common arguments listed there).
The GATK4 doc system doesn't generate a separate page for these like GATK3 did, and I think doing so would be of questionable value, since there are several classes of tools, each of which has it's own set of "common" arguments (GATK Walker tools, GATK Spark tools, Picard tools, and some GATK "cowboy" tools that do their own thing).
We did discuss an alternative design a while back with @droazen and @vdauwera, but that was never implemented, and was a variant of the current design where the common args are included with each tool.
@cmnbroad and @vdauwera Barclay doesn't pull the USAGE_DETAILS
portion of Picard tools towards gatkDocs. So Picard documentation is minimal with just a summary description of each tool.
Doesn't seem right to duplicate the same information in a tool doc, once in the asterisked javaDoc portion and once in USAGE_DETAILs for whatever system creates this view, which I am to understand will go to the wayside someday in favor of Picard documentation being offered only through https://software.broadinstitute.org/gatk/.
Seems we should use the asterisked gatkDoc portion for GATK-specific documentation we want, e.g. commands that invoke Picard tools through the gatk launch script and using GATK4 syntax, and pull the rest of the documentation from the USAGE_DETAILS
(Picard jar command example).
I've prioritized Picard tools in a second tab of the shared Google spreadsheet towards Picard doc updates. Please let me know how we want to approach Picard tool doc updates @vdauwera.
@cmnbroad
We did discuss an alternative design a while back with @droazen and @vdauwera, but that was never implemented, and was a variant of the current design where the common args are included with each tool.
Sounds like we don't want separate documents for standard arguments and this was decided some time ago by @droazen and @vdauwera. So am I correctly hearing that these can be removed from the list?
Tally of tools (excludes filters/annotations) :
127 GATK tools
94 Picard tools
221 total tools
Per category:
category (14) | number of tools (221) |
---|---|
Reference | 6 |
Base Calling | 7 |
Diagnostics and Quality Control | 49 |
Contamination | 7 |
Intervals Manipulation | 11 |
Read Data Manipulation | 46 |
Alignment, Duplicate flagging and BQSR | 16 |
Short Variant Discovery | 8 |
Short Variant Filtering | 7 |
Short Variant Manipulation | 17 |
Short Variant Evaluation and Refinement | 14 |
Copy Number Variant Discovery | 28 |
Structural Variant Discovery | 4 |
Other | 1 |
Per developer, assuming ~20 of us, this means ~11 tools per developer. If folks are feeling generous and will claim more, this frees up busy coworkers for other work. If I've forgotten anyone, please add to table. Megan is away until next year.
developer (25) | number of tools updated (claimed) |
---|---|
Yossi | 0 |
Valentin | 0 |
Ted S. | 0 |
Ted B. | 0 |
Takuto | 0 |
Sara | 0 |
Sam L. | 0 |
Steve | 0 |
Sam F. | 0 |
Marton | 0 |
Mehrtash | 0 |
Mark W. | 0 |
Maddi | 0 |
Louis | 0 |
Lee | 0 |
Jose | 0 |
Jonn | 0 |
Laura | 0 |
Mark F. | 0 |
James | 0 |
David R. | 0 |
David B. | 0 |
Chris W. | 0 |
Chris N. | 0 |
Andrey | 0 |
Folks should claim the 11-12 tools they will work on, by putting their name on the spreadsheet next to the tools. Otherwise, we will assign you tools. SOP to follow.
As we need to start implementing the functional organization. This will change Picard tool organization @yfarjoun.
I'm not sure I agree with the "short variant filtering" and "short variant manipulation" and "Short Variant Evaluation and Refinement" definitions. For example FixVcfHeader has nothing to do with "short variants" and everything to do with VCF. you could put a SV into a VCF and then fix its header with the tool. similarly you could filter a vcf that has "large" variants in it...also, "count variants" has nothing to do with "small variants"...
On Mon, Nov 27, 2017 at 6:03 PM, sooheelee notifications@github.com wrote:
Call for any objections/changes to the classification scheme
As we need to start implementing the functional organization. This will change Picard tool organization @yfarjoun https://github.com/yfarjoun.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/3853#issuecomment-347358639, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0rnAueYp7qOJlNoKlt9318XAD2URks5s6z_cgaJpZM4QitCF .
Some of the CNV tools are miscategorized: GetHetCoverage + tools in the "Intervals Manipulation" category. The latter should probably be considered CNV-specific because they either use the target-file format (which is only used in the legacy CNV + ACNV pipeline) or perform a task that is specific to the CNV pipeline and probably not of general interest (PreprocessIntervals).
@yfarjoun Geraldine has promised to followup on the categorization discussion. @samuelklee Remember that the Best Practice Workflows and related documentation will guide folks to which tools to use for each workflow. The tool docs section is meant to categorize based on function and is purposefully workflow-agnostic.
That's fine, but my point is that these tools will almost certainly be used only in the CNV workflows due to their limited function (or reliance on CNV-specific file formats). Workflow-agnostic categorization is great for general tools that might be shared by several workflows, but I think it's somewhat misleading here.
Essentially, this is the opposite of the issue that @yfarjoun pointed out (where more general tools are assigned to a specific workflow...)
@vdauwera @ldgauthier, could either or both of you help your fellow developers out with what you consider an A+ tool doc? @mwalker174 and others have asked for this. It would help those new to GATK tool documentation immensely.
In the meanwhile, here are some from me, in order of increasing complexity, and just to start the discussion.
Additional Notes
section with useful information. I might think of merging the last two sections somehow if I had to do it over.@ldgauthier says CalculateGenotypePosteriors is solid and VariantsToTable and, again, SelectVariants are good too. @vdauwera interjects saying SelectVariants has too many alternate command examples but agrees it is solid.
Tool commands should use gatk to invoke the launch script
(my emphasis)
what about Piacrd tools? I don't think it is appropriate for them to have gatk
in the commandline...can this be clarified please?
@yfarjoun Any Picard tool involved in the Best Practices, as defined by those tools in the WDL scripts at https://github.com/gatk-workflows, should (also) have a command that uses the gatk
script to launch the tool. This is what Geraldine conveyed previously and I tentatively made a list of such tools in the second spreadsheet picard-tools-by-interest
, under [A]. All other Picard tools can keep the Picard jar example command.
One thing we want, if possible, is for commands that showcase a reference to either showcase GRCh38 or a nondescript reference, e.g. reference.fasta
. We should minimize exposure of hg19/b37.
hmmm. I doubt it will go down well with the maintainers of PICARD to have "gatk" written as an example of how to use the tool....I think that this isn't a good solution, and I don't think that it will pass review in the picard reository...I do not feel comfortable opening a PR that does this...
On Thu, Nov 30, 2017 at 9:18 AM, sooheelee notifications@github.com wrote:
@yfarjoun https://github.com/yfarjoun Any Picard tool involved in the Best Practices, as defined by those tools in the WDL scripts at https://github.com/gatk-workflows, should (also) have a command that uses the gatk script to launch the tool. This is what Geraldine conveyed previously and I tentatively made a list of such tools in the second spreadsheet picard-tools-by-interest, under [A]. All other Picard tools can keep the Picard jar example command.
One thing we want, if possible, is for commands that showcase a reference to either showcase GRCh38 or a nondescript reference, e.g. reference.fasta. We should minimize exposure of hg19/b37.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/3853#issuecomment-348200262, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0m6XNNJPPXqWPEBAjVUAKyqoYKZiks5s7rkbgaJpZM4QitCF .
I hope @vdauwera's visit at the Methods meeting addressed concerns @yfarjoun.
It did, thanks!
On Thu, Nov 30, 2017 at 7:36 PM, sooheelee notifications@github.com wrote:
I hope @vdauwera https://github.com/vdauwera's visit at the Methods meeting addressed concerns @yfarjoun https://github.com/yfarjoun.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/3853#issuecomment-348366430, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0qk88fa-Vh7CrvGvk1wcrWkXCl_Iks5s70omgaJpZM4QitCF .
Just a reminder tech leads @droazen @cwhelan @samuelklee @ldgauthier @vruano @yfarjoun @LeeTL1220 @cmnbroad , that the tool doc updates need to be done, reviewed and merged as soon as possible. Please assign your people tools to update if you haven't already and let us know the status of the changes in the STATUS column of the spreadsheet.
Please prioritize tools that are featured in any Best Practice workflow. The forum docs revolve mostly around Best Practice workflows.
@chandrans and I have to then take your new kebab parameters and edit the entirety of the forum documents to represent the new syntax and WE HAVE LESS THAN 9 WORKING DAYS TO DO SO as of today 12/4. @chandrans is leaving for the holidays starting December 15 and will not be back until near the release on January 9. Once she goes on holiday, I take over her forum duties, which is a full-time job. We really need to have these changes now so we can start working on updating forum docs as the updates are merged to master.
Thank you for your work towards these improvements. Again, although I cannot help in changing code, and do not understand the intricacies, I have brought in homemade cheesecake towards fueling your work towards these updates.
@sooheelee Some updates from CNV:
I told my team to hold off on doc updates until we can finalize tool deletions. The first round of CNV tool deletions was just merged in #3903.
Another round may be coming, pending discussions with @vdauwera and @LeeTL1220. This could potentially remove the entire old somatic workflow. If so, then the tools that we'd need to update for release would be:
PreprocessIntervals (@MartonKN) AnnotateIntervals CollectAllelicCounts CollectFragmentCounts (@asmirnov239) CreateReadCountPanelOfNormals DenoiseReadCounts ModelSegments CallSegments (updated version) CombineSegmentBreakpoints (@LeeTL1220)
DetermineGermlineContigPloidy (@mbabadi) GermlineCNVCaller (updated version) (@mbabadi)
Except where indicated, I'll be responsible for updates to these tools.
Until a final decision is made about tool deletion, CNV team will hold off on self-assigning their remaining tool quotas.
Also, note that 10 tools were deleted in #3903 and 17 could potentially be deleted in the next round, so everybody's quota should go down accordingly.
I moved the tool ParallelCopyGCSDirectoryIntoHDFSSpark to the other
category since it's just a data movement utility that's not really tied to any pipeline. Hope that's OK.
Does anyone have thoughts about or examples of what a command line example for a spark tool should look like? In particular I'm wondering what we should put for the Spark-cluster specific parameters that come after the --
separator, like sparkRunner
, etc.
I don't think we can rename the Spark cluster arguments @cwhelan -- most of these "pass through" to the underlying spark-submit
/gcloud
command.
@droazen No, wasn't suggesting renaming the args themselves, I just was wondering what kind of example values we should pass in the usage example. For example, is it OK to pretend your usage example is running on a dataproc cluster, eg -- --sparkRunner GCS --cluster my-dataproc-cluster
.
@sooheelee I have a suggestion regarding categories. Can we change "Contamination" to "Metagenomics" and perhaps move the "CalculateContamination" tool to the "Diagnostics and Quality Control" category?
IMO, contamination has a connotation of introducing foreign matter unintentionally. Strictly speaking, PathSeq is not just for detecting sample contaminants but also endogenous organisms in various biological sample types (like stool or saliva). I think users with metagenomic data might overlook this if they are labeled as being for "contamination."
samuelklee Thank you for reducing the tool count. It is now 211 (~10.5 per developer assuming 20 able bodies) and could soon be 194 (~10 per developer). I appreciate you keeping us posted and hope to hear back soon about the other tools.
@cwhelan Thanks for moving the tool to the other
category. I definitely mis-categorized that one. As for spark parameter examples, yours looks good and complies with GCS requirements.
-- --sparkRunner GCS --cluster my-dataproc-cluster
In the FlagStatSpark tutorial I note that a cluster name:
must be all lowercase, start and end with a letter and contain no spaces nor special characters except for dashes.
P.S. This is the tutorial to note for background information on setting up Spark.
@mwalker174 Thank you for your suggestion. I will take it and incorporate it now into the spreadsheet. Metagenomics
certainly has a more positive connotation to it than Contamination
. If anyone objects, please let us know here (@davidbenjamin?).
@yfarjoun I have renamed three categories:
I also switched the category ordering of the last two, which the forum will reflect (asterisked switched):
...
Short Variant Discovery
Variant Filtering
Variant Evaluation and Refinement*
VCF Manipulation*
Copy Number Variant Discovery
...
Picard LiftoverVcf and RemoveNearbyIndels are not best described by VCF Manipulation
but rather could fit under Variant Evaluation and Refinement
but I will leave them as is.
MarkIlluminaAdaptors doesn't fit so well under Base Calling
. Could be better under Read Data Manipulation
or Alignment, Duplicate flagging and BQSR
. I will keep as is for historic Illumina
category.
CreateSomaticPanelOfNormals is currently under Short Variant Discovery
as it supports Mutect2 calling. However, it could be better underVariant Filtering
, Sounds like filtering but also refining a cohort. However, the PoN is meant mostly for artifacts of mapping/sequencing and so its records, although mostly germline variants, are not strictly germline variants. Variant Evaluation and Refinement
or VCF Manipulation
as it really just outputs the sites called in at least two samples.moved 12/7
Looks like someone added a new 15th category: RNA-specific Tools. Two tools under this category are:
@vdauwera fyi is this okay with you?
These were previously under Diagnostics and QC
and Read Data Manipulation
, respectively.
Based on Geraldine's suggestion created new category Coverage Analysis
.
Since ASEReadCounter fits under this category, alongside:
ASEReadCounter
CountBases
CountBasesSpark
CountReads
CountReadsSpark
DepthOfCoverage
DiagnoseTargets
GetHetCoverage
GetPileupSummaries
Pileup
PileupSpark
I have deleted the RNA-specific Tools category and moved SplitNCigarReads back to Read Data Manipulation
.
@cwhelan Good question about the spark arguments. Can you please post a command showing how you typically use spark args, and we can discuss how to make that as generic as possible?
@sooheelee We've decided to delete the old tools. I will issue a PR soon. Hopefully this lightens the load on everyone a bit!
Incidentally, I would be fine with adding the tools CollectFragmentCounts
and CollectAllelicCounts
to the CoverageAnalysis
category, as they are performing relatively generic tasks that fall in that category. (with the caveat that the output formats contain column headers that are specific to the new CNV workflows).
. . .and perhaps move the "CalculateContamination" tool to the "Diagnostics and Quality Control" category?
@sooheelee I like this idea from @mwalker174.
Thank you everyone for your contributions towards this documentation effort. Instructions from @vdauwera
to followat this Google doc Favorite tool doc examples from @vdauwera NOW in her SOP doc. Spreadsheet from @sooheeleeto beposted here