SeqSQC formal review - Githubissues

Liubuntu commented 7 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/Liubuntu/SeqSQC

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Subversion (optionally via GitHub).

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

SeqSQC is an Experiment Data package.

bioc-issue-bot commented 7 years ago

Hi @Liubuntu

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SeqSQC
Title: A bioconductor package for sample quality check with next generation sequencing data
Version: 0.99.0
Authors@R: person("Qian", "Liu", email = "qliu7@buffalo.edu", role = c("aut", "cre"))
Description: The SeqSQC is designed to identify problematic samples in NGS data,
    including samples with gender mismatch, contamination, cryptic relatedness, and
    population outlier.
biocViews: Experiment Data, Homo sapiens Data, Sequencing Data, Project1000genomes
Depends:
    R (>= 3.3.0)
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
VignetteBuilder: knitr
Imports:
    e1071,
    GenomicRanges,
    GGally,
    gdsfmt,
    ggplot2,
    IRanges,
    methods,
    rbokeh,
    RColorBrewer,
    reshape2,
    rmarkdown,
    S4Vectors,
    SNPRelate
Suggests:
    BiocStyle,
    knitr

bioc-issue-bot commented 7 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20170828161653.html

lshep commented 7 years ago

Please work on correcting the ERRORs, Warnings, and Notes from the build report. Also please ensure that a webhook is initialized so that on subsequent version bumps a new build of the package will be generated.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

9f5a740 R CMD check/BiocCheck errors and warnings.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20170829112019.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

5d17303 subscribe to bioc-devel mailing list

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20170829114042.html

Liubuntu commented 7 years ago

Hi Lori,

I have just build the "SeqSQC_0.99.2" and have passed check and BiocCheck with no error or warning. Please help me with the next step. Thank you!

Qian

lshep commented 7 years ago

Hi Qian, Some comments/questions below:

BUILD REPORT:

[ ] fix the no visible binding for global functions or variables. see ?globalVariables in R for help
[ ] add some unit tests
[ ] make accessor methods for commonly accessed results rather than accessing with @
[ ] add a NEWS files in the top directory (R, man, inst, NEWS, etc) to keep track of changes
[ ] I'm assume the dontrun in LoadVfile.Rd, RenderReport.Rd, sampleQC.Rd were do to time constraints? In your man and vignette you don't run sampleQC or LoadVfile. how long do these generally take to run?

VIGNETTE:

[ ] add a bioconductor installation and a library load section

source("https://bioconductor.org/biocLite.R")
biocLite("SeqSQC")

[ ] add a runnable example - there are other packages in Bioconductor that provide vcf files that could perhaps be used with system.file. perhaps GenomicFiles, VariantAnnotation, TVTB, genotypeeval. I can't truly evaluate the package or code without being able to test it.
[ ] Create accessor methods for your class, do not access with @

GENERAL QUESTION: Could you do a quick comparison to genotypeeval? They may not be similar at all but also does some sort of QC with VCF.

MAN:

[ ] missing package documentation
[ ] missing class documentation
[ ] missing dataset documentation

R Code:

[ ] Can your SeqSQC class extend SummarizeExperiment, we generally encourage reusing existing Bioconductor infrastructure packages like how you did with gdsfmt::gds class ? I can't give very good recommendation as I can tell what the structure of your class holds. What types of objects are stored in SeqSQC class?
[ ] As mentioned above, the class should have accessor methods instead of accessing slots through @

Please address the above issues and when you are ready for a second review please comment back here with what has been updated or why things haven't and/or some clarification on above concerns.

Cheers Lori

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

e180459 added zzz.R

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20170915232453.html

lshep commented 7 years ago

Please clean up the R directory. It looks like you uploaded some temp files into the directory (#LoadVfile.R#, .#LoadVfile.R , zzz.R~ ) .

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

9ce2d4c add test units; modified NEWS.

Liubuntu commented 7 years ago

HI Lori,

I just pushed a new version of 0.99.4. Here are answers to 2 of your review questions:

<1>. COMPARISON with "genotypeeval". The genotypeeval could evaluate sample quality with the VCF files (in batch). By loading some reference “gold” standards such as 1000 Genomes or dbSNP, it evaluates the samples for ti/tv ratio in and out of the coding regions, and the total # of calls (or specific # of homozygous/heterozygous alternative calls), and ancestry estimation, and etc.. However the package of "genotypeeval" is different from "SeqSQC" in these 2 major ways: 1. "genotypeeval" takes vcf files as input, and processes directly with the vcf files. Given that size vcf files from NGS (whole-exome / whole-genome) are in Gbs for human genome, it would be not efficient, especially when processing with batch vcf files. SeqSQC take the vcf files as input, merge it with the benchmark data we have assembled from 1000 Genomes Project, and save the merged dataset as "SeqSQCclass" for following analysis. The "SeqSQCclass" uses GDS (Genomic Data Structure) to save and access the genotype and phenotype data in a more efficient way (3~5 times comparing to vcf), and could also incorporate all QC results in the same data for a more convenient processing and management. 2. Sample quality issues are very general and could come from different reasons, for example, self-identified gender problem, sample contamination, unported sample relatedness, and self-identified ancestry issue. "genotypeeval" evaluates sample quality mostly from the general aspect of # variants comparing to the reference data to indicate sequencing error (sample quality issue from sequencing end, not sample itself). it makes ancestry estimation for samples and gives the value of proportion to each ancestry, which is not directly understandable to users. However, SeqSQC could identify problematic samples from all possible sources, including inbreeding outlier, population outliers and samples with gender mismatch, cryptic relatedness. The plots of study samples together with benchmark samples from the same population group give an direct and interactive idea of how these samples are identified as problematic. <2> R CODE: Q: Can your SeqSQCclass extend SummarizeExperiment? A: No. The SummarizeExperiment stores count data from RNAseq dataset. The SeqSQC stores and processes genotype data from whole-genome sequencing or whole-exome sequencing, which are extremely high-dimensional and requires large R space for processing. I used GDS (Genomic Data Structure) within the SeqSQCclass to achieve efficient genotype data storage and access. The other updates are included in the "NEWS" file. Please check. Thanks. Best, Qian

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171004123535.html

lawremi commented 7 years ago

No, SummarizedExperiment is not just for storing count data from RNA-seq. It's designed exactly for this use case. It would probably benefit from a GDS backend to DelayedArray, if one does not yet exist.

Liubuntu commented 7 years ago

Sorry for the uncareful phrasing... To re-answer the question (Can your SeqSQCclass extend SummarizeExperiment?), I have some core functions within my package that are based on the GDS format, but the users are not required to know the operation of GDS files in R. So I have defined a new class SeqSQCclass to store the file path of the main gds file where we store the genotype information from sequencing data, and the sample QC results, like sex check, inbreeding and IBD coefficients, so that uses could easily read out the sample QC results from the SeqSQCclass object.

Because the GDS data stores the high-dimensional genotype data from sequencing experiments, and it allows users to perform operations on it without loading the whole data matrix in memory. I was not able to figure out if the SummarizedExperiment could support the GDS data structure. If yes, I would love to update in the future version of my package extending to SummarizedExperiment.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

354a9fe debugged R CMD BiocCheck;

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171004183449.html

lawremi commented 7 years ago

It should be possible to implement DelayedArray on top of GDS; then you just stick that matrix into the SummarizedObject object. I also wonder whether you couldn't use the objects from the SeqArray package, which already wrap GDS objects, although not as a SummarizedExperiment.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

5992bf7 Installed BiocGenerics locally.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171005111346.html

Liubuntu commented 7 years ago

Hi Michael,

Thank you for your insightful suggestions and comments. I have looked at SeqArray during my development of SeqSQC and cited it in my paper (yet to submit) for SeqSQC. It is good at dealing with high-dimensional sequencing data, and could at the same time save the annotation info for both variants and samples. It is developed on top of GDS (same developer).

While the main purpose for my package SeqSQC is to implement sample quality check with the genotype information from sequencing data. Functions are written to take the vcf or plink (next version) as input and save them in an efficient data structure, and then use the new data structure as input for the later QC functions. The SeqSQCclass contains 2 slots, 1 is the filepath to the gds file where the genotype data is stored, and the other slot is a list containing sample annotation (population, gender, relationship to other samples), and QC results like sex check and etc.. This structure is easy to deal with, and the users are not supposed to do any operation on the gds file itself. All QC steps will read in the SeqSQCclass object, extract information from the gds file in the backend for QC purpose, output QC result and append it to the second slot in SeqSQCclass object.

The other reason for defining the SeqSQCclass is that, we need a strict format for the input files. For example, we need the input of sample population to be strictly within AFR, EUR, ASN, EAS, SAS. Anything beyond these would return an error. So by defining the new class, we could do the validity test for these information.

As a summary, the SeqSQCclass object serves 3 purposes, one is to protect the gds file to be intact (by only including the filepath); the other is to save the QC results in a easily readable way; last, it could do the validity test for input information.

Best, Qian

lawremi commented 7 years ago

It seems like the first concern may be satisfied by the data structures from SeqArray. Perhaps SeqSQC could have a slot for one of those, and another slot for the QC information. The population constraint seems to have more to do with the input rather than an object containing the computed QC information. Are you sure the two should be coupled? Also, it seems redundant to have a class with "class" in its name.

Liubuntu commented 7 years ago

Hi Michael,

Thank you for your suggestion for removing "class" in the name of "SeqSQCclass". I will correct it and push another version.

For the question of coupling of sample annotation and QC results in SeqSQCclass object:

SeqSQC has these plot functions corresponding to each QC step (plotMissingRate, plotSexCheck...). It take SeqSQCclass objects as input, and read both of the sample annotation and the QC result for plotting. By including the sample annotation file in the SeqSQCclass object, it could make the checking process easier for the plot functions, by only checking if the input is a valid SeqSQCclass object.

Best, Qian

lshep commented 7 years ago

Some further comments:

vignette code:

> seqfile <- sampleQC(vfile = infile, output = outfile, capture.region = cr, sample.annot = sample.annot, format.data = "NGS", format.file = "vcf", QCreport = TRUE, out.report="report.html", interactive = TRUE)
Error in sampleQC(vfile = infile, output = outfile, capture.region = cr,  : 
  object 'seqfile' not found

[ ] perhaps you changed the name of the input variable?

> seqfile <- SeqSQC(gdsfile = gfile, QCresult = QCresult(seqfile))
Error in (function (classes, fdef, mtable)  : 
unable to find an inherited method for function 'QCresult' for signature '"SeqSQCclass"'

[ ] I assume you are updating to take Michaels advice which I do agree the class is redundant - let me know when I can test your vignette and man code

MAN:

[ ] Still no package documentation - might be useful to add an overview of the package and pointing to the main functions
[ ] class - what/how are the results of each of the QC checks stored in the object? as a list? if it is a named list might be worth have accessors for the information? Maybe not I'm not sure how the results would be further interpreted or used

R CODE:

[ ] You should consider Michaels suggestion on DelayedArray for a future implementation - since this would be a major overhaul of your package I won't insist on it for now but it is something to strongly consider for a future implementation

Note: I may make comments for one R script but should be generally applied to all R scripts:

subsetGDS

[ ] have a check for input what is gds? if gds is your object, use the accessor methods you created not $ or @
[ ] the use of vapply/sapply/lapply/apply is encouraged over for loops it simplifies the code
[ ] remove commented out code

sampleQC

[ ] check the variables in the first section with your input arguments
[ ] why do you have vfile and sfile - have one input variable and check the object class to distinguish which load method OR better yet you could make it a generic S4 method that dispatches off the different input types - one could wrap to the other after the loading
[ ] You might have arguments to the function to make plotting and writing tables optional
[ ] would there ever be a case where someone would want to skip one of the QC checks and is this possible? it looks like this a wrapper around the individual that the user could run on their own so this would be a convenient way to run all which is why you expose all the checks to the user?

problemList

[ ] Are you sure for sexCheck that it will always be "female/male" not "Female/Male" or F/M etc. ...

general about plotting functions:

[ ] most of your plotting functions do the same thing with minor difference or grabbing a different subset of stored results to use - this results in a lot of repeated code - generalize the plotting function so the code isn't repeated and use wrappers to specified based on type and to do unique

mergeGDS

[ ] remove commented out code
[ ] This seems messy and again a lot of repeated code that seems like could be made into its own function

general

[ ] if you are constantly reading in and utilizing different nodes of the gdsn file have a wrapper function that stores the output and access from that

function(gfile){
obj <- list(read.gdsn(index.gdsn(gfile, "sample.id")),
read.gdsn(index.gdsn(gfile, "sample.annot")) 
obj 
}

you do this a lot in the merge and I'm seeing again in sexcheck - and im sure it lots of other places of code Like this section is repeated in several files can it be condensed into its own

gfile <- SeqOpen(seqfile, readonly=TRUE)
samples <- read.gdsn(index.gdsn(gfile, "sample.id"))
sampleanno <- read.gdsn(index.gdsn(gfile, "sample.annot"))

snp.chr <- read.gdsn(index.gdsn(gfile, "snp.chromosome"))
snp.pos <- read.gdsn(index.gdsn(gfile, "snp.position"))
snps <- read.gdsn(index.gdsn(gfile, "snp.id"))

## sample filters. (remove prespecified "remove.samples", and only keep samples within the study population)
study.pop <- unique(sampleanno[sampleanno$group == "study", "population"])
if(length(study.pop) > 1) stop("Study samples should be single population, please prepare input file accordingly.")

if(study.pop == "ASN") study.pop <- c("EAS", "SAS", "ASN")

[ ] if(study.pop == "ASN") study.pop <- c("EAS", "SAS", "ASN") what is the reasoning of doing three pop if the specified is asian - are populations limited to EAS SAS and ASN only or are there more?

LoadVfile

[ ] line 62 misc 0

IBDRemove and IBDRemoveAll

[ ] are the same exact code with maybe one line different this should be the same function with a flag like all=TRUE to disguish different code

Please make updates or address these issues in some way and comment back here with changes and when ready for another review.

Thank you ~L

Liubuntu commented 7 years ago

Hi Lori,

Thank you for the second review. I will address these issues and hopefully get back to you by tomorrow.

Best, Qian

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

cffacae 2nd review

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171015235112.html

Liubuntu commented 7 years ago

HI Lori,

I have pushed a new version of SeqSQC, with the following revisement:

Added package documentation.
Added checking for input in subsetGDS.
Removed commented code for R scripts.
SeqSQCclass renamed as SeqSQC.
Using lapply for some repeated functions like "read.gdsn(index.gdsn())" in SexCheck, Inbreeding, IBDCheck, PCACheck, mergeGDS...
sampleQC: check the class of vfile for SeqSQC file or vcf/plink file, instead of using vfile / sfile.
plotting function writing separately (not export), wrapper with plotQC().
Removed IBDRemoveAll.R. Will check IBD for all sample pairs (including benchmark samples) by adding argument "all" in "IBDRemove.R".
debugged warning messages from "LoadVfile" and using "rbokeh".

To answer some of the other questions:

sampleQC: You might have arguments to the function to make plotting and writing tables optional; would there ever be a case where someone would want to skip one of the QC checks and is this possible? it looks like this a wrapper around the individual that the user could run on their own so this would be a convenient way to run all which is why you expose all the checks to the user? problemList

A: The sampleQC is written to do the one-step sample QC to include data preparation, all QC steps and QC result summary, and to make a complete and reliable sample QC for the NGS data, which is essential for cleaning samples before any downstream analysis. There are also some separate functions for each QC step and plotting if the user want to skip or do any specific ones. There are also already 13 arguments in the "sampleQC" function, I can add some additional arguments to let the user define which one to skip, but not quite sure if they are really needed.

In SampleQC.R, are you sure for sexCheck that it will always be "female/male" not "Female/Male" or F/M etc. ...

A: Yes.. Because the result data from sex check is written to have the column as "female/male" rather than other format.

if(study.pop == "ASN") study.pop <- c("EAS", "SAS", "ASN") what is the reasoning of doing three pop if the specified is asian - are populations limited to EAS SAS and ASN only or are there more?

A: There are 4 population from our benchmark data (assembled from 1000 Genomes Project), which are EUR, AFR, EAS (east Asian) and SAS (south Asian). So the Asian population were represented by 2 detailed EAS and SAS separately. If user have samples from Asian, we will use both of the EAS and SAS benchmark samples to assist the QC steps.

Please let me know if you have any questions or concerns. Thanks a lot!

Best, Qian

lshep commented 7 years ago

I think we are close ... A few very minor vignette and man pages updates - and some optional suggestions for future implementations:

Vignette:

[ ] When I run the vignette code - all the output is placed in the current directory as it uses the dirname(outfile) and outfile ="testWrapUp" - we recommend using a tempdir() for vignette output
```
outdir = tempdir()
outfile = file.path(outdir, "testWrapUp")
```
also it seems like the argument output is the name of the file for an output object and the dirname of that is used as output directory for the rest of the files - this should be made more clear in the argument description or the output directory should be the directory location - and have a default name for the object like you do for the other txt and pdf output files
[ ] Optional future implementation - I still think there should be a flag for whether the write tables and pdfs get generated since it seems like these could be generated at any time in the future with the object - there doesn't have to be one for each step just a generic argument like plotting=TRUE, results=TRUE knowing that all the steps will follow the same argument.
[ ] GDS Class "Other information from the VCF file, including the chromosome, position, snp rs id, reference allele, alternative allele, quality score and filter can also be passed into the gds file." How? Is this stored in your object? or is just an option based on using the gds.class?
[ ] Standard workflow - I would mention that running SampleQC runs all of these steps as a convenient all inclusive method but that these are the steps and recommended ordering if run individually and not using SampleQC ?

Man: Thanks for adding the package documentation - add a seealso and/or mention the main functions of your package sampleQC

R Code Plotting functions - I like the one exposed plotting function this is good!

[ ] Optional but would be good to clean up eventually - the underlining plotting functions it calls still have a lot of similar code it would be nice if possible to generalize the code more so that there is only one or two underlying plot functions where the code that is different is determined by arguments. - onsider this for future implementation

If you fix the vignette and man pages I think this is ready for acceptance and first implementation but again consider the code improvements suggested by Michael and myself for further improvements.

When ready for a look over - please do the version bump and comment back here with updates and answers to questions.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

9d25b5e tempdir() for vignette output; updated argument 'o...

Liubuntu commented 7 years ago

Hi Lori,

Thanks a lot for your very detailed review and practical suggestions. I have addressed most of the issues including:

vignette, using a tempdir() for vignette output.
Vignette: Additional explanation added for the option to do a wrapper all-inclusive QC or specific QC steps.
documentation for argument "output" in "LoadVfile" and "sampleQC", to indicate that the dirname(output) would be used as the directory to save the other QC results in .txt and plots in .pdf. The default value is added as file.path(tempdir(), "sampleqc").
sampleQC.R: Arguments of plotting=TRUE, results=TRUE added and documented, giving the option for users to whether or not output the QC results and plots. The problem list will always in output for the simplest sample QC result and summary.
the package documentation updated, by adding a @seealso for links of main functions of SeqSQC.

To answer some of your other questions:

Question: GDS Class "Other information from the VCF file, including the chromosome, position, snp rs id, reference allele, alternative allele, quality score and filter can also be passed into the gds file." How? Is this stored in your object? or is just an option based on using the gds.class?

Answer: The meta info from vcf files are read into the gds file by \code{snpgdsVCF2GDS} within \code{LoadVfile}. They are saved in the gds file, with the filename kept as slotName \code{gdsfile} in the SeqSQC object. The reading is not optional. We used these snp information to merge the study cohort and benchmark data.

Questions: R Code for underlying plotting functions to be more general:

Answer: - the underlining plotting functions are written for different data sets generated from different QC steps. The data structure are slightly different, (except for IBD and PCA results which have more complex structure and plotting options). I will try to generalize the data structure of QC outputs and also the plotting functions in the next version, when I would possibly adopt the SeqArray or delayArray for NGS data storage and processing.

Please review the package and let me know for any further question. Thanks again!

Best, Qian

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171016123535.html

lshep commented 7 years ago

The default value is added as file.path(tempdir(), "sampleqc"). - It probably should default to the users working directory when setting the default within the function but you would want to use the tempdir() in the vignette. The tempdir creates a temporary directory that disappears after the R session is terminated - this is appropriate for the examples run in the vignette but probably not as a default for your main functions if the user forgets to specify (most assume the working directory) - so the default can just be "sampleqc"

The other changes look good.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

ba50e9b argument of 'output' in 'sampleQC' and 'LoadVfile'...

lshep commented 7 years ago

Don't forget to update the man files too - its causing the mismatch in documentation/code

Liubuntu commented 7 years ago

Hi Lori,

Thanks for your suggestion. I have just updated the default value for output to be in the working directory. It makes more sense right now.

Updated the output argument in function of sampleQC and LoadVfile, by using the default value of \code{sampleqc} in working directory. In vignette and the example code of these 2 functions, temporary directory is used for the directory to save the QC results.

Thanks. Qian

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171016143913.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

27aef94 documentation updated for updated argument 'output...

Liubuntu commented 7 years ago

Sorry for the code/documentation mismatches... updated the documentation. Thanks.

lshep commented 7 years ago

No problem. I'll wait for the build report but as long as its clean now I'll accept the package.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/SeqSQC_buildreport_20171016145525.html

bioc-issue-bot commented 7 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be sent to the maintainer email address in the next several days.

Thank you for contributing to Bioconductor!

Liubuntu commented 7 years ago

Hi Lori & Michael,

Thank you for your help and insightful comments and suggestions about SeqSQC. It works with NGS data for sample quality check quite efficiently right now. but still I am looking to update the package in the newer version to:

accommodate the PLINK data from GWAS experiment.
using SeqArray or DelayedArray instead of gds format to store and access the high-throughput sequencing data.

Thanks again and your comments and suggestions are greatly appreciated.

Best, Qian

mtmorgan commented 7 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/Liubuntu.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"SeqSQC\"). The package 'landing page' will be created at

https://bioconductor.org/packages/SeqSQC

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

SeqSQC formal review #460