Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

SynExtend #1369

Closed npcooley closed 4 years ago

npcooley commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @npcooley

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SynExtend
Type: Package
Title: Tools for working with Synteny Objects
Version: 0.99.05
Date: 2020-02-12
Authors@R: c(
    person("Nicholas", "Cooley", email = "npc19@pitt.edu", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-6029-304X")),
    person("Adelle", "Fernando", email = "Adelle.fernando@gmail.com", role = "ctb"),
    person("Erik", "Wright", email = "ESWRIGHT@pitt.edu", role = "aut"))
BiocVews: Genetics, Clustering, ComparativeGenomics
Description: Tools for working with Synteny Objects
Depends: R (>= 3.6.0), DECIPHER (>= 2.14.0), igraph (>= 1.2.4.1)
Imports: Biostrings, IRanges, S4Vectors
Suggests: knitr
License: GPL-3
Encoding: UTF-8
LazyData: true
NeedsCompilation: no
VignetteBuilder: knitr
bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

mtmorgan commented 4 years ago

please provide a more extensive Description:, like a paper abstract describing the unique contributions your package is making.

Please mark the first code chunk of your vignette with {r, eval = FALSE} so that building the vignette does not trigger installing the package.

In the vignette, please use rtracklayer::import() to import the gff file into a standard R data representation; make sure that this representation can be used as inputs to your functions.

The assigned reviewer will provide additional comments after you have made these (and any other issues identified by the automatic checks) changes.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

9c7f526 Vignette and Description edits

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bd79693 warnings and errors 001

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

c0b28fc version bump?

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bc20016 address warnings?

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

ef5a5e9 added imports adjusted examples

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

96ef990 version bump?

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

6205e22 formatting and DESC adjustments

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

aa3327f vignette formatting

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bcd7c60 vignette formatting

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

dc64ae8 vignette formatting

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

2781165 R Version Dependency

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

lshep commented 4 years ago

Build Report

README

DESCRIPTION

NAMESPACE

INST

Data

Vignette

man pages

data man pages

NucleotideOverlap

PairSummaries

gffToDataFrame

R code In general the code need clean up and compression.

PairSummaries

Would become.  
if (!(GeneCalls[[m1]][SyntenyLinks[[m1, m2]][i, "QueryGene"], "Coding"]) |
          !(GeneCalls[[m2]][SyntenyLinks[[m1, m2]][i, "SubjectGene"], "Coding"]) |
          IgnoreDefaultStringSet) {
myXStringSet = AlignSeqs(myXStringSet = Scores[[Count]][[i]], verbose = FALSE)
type = "DNA"

}else{ myXStringSet = AlignTranslation(myXStringSet = Scores[[Count]][[i]], sense = "+", direction = "5' to 3'",type = "DNAStringSet", readingFrame = 1L, verbose = FALSE) type = "AA } IDType[[Count]][i] <- type Scores[[Count]][[i]] <- 1 - DistanceMatrix(myXStringSet = myXStringSet, penalizeGapLetterMatches = GapPenalty, includeTerminalGaps = TerminalPenalty, correction = Correction, type = "matrix", verbose = FALSE)[1, 2]

This eliminates redundant code and condense the code to make it easier to read. Please apply throughout this script and NucleotideOverlap.R 
Another example 

if (PIDs) { DF <- data.frame("TotalCoverage" = unlist(TotalCoverage), "MaxCoverage" = unlist(MaxCoverage), "MinCoverage" = unlist(MinCoverage), "NormDeltaStart" = unlist(NormDeltaStart), "NormDeltaStop" = unlist(NormDeltaStop), "NormGeneDiff" = unlist(NormGeneDiff), "ExactMatch" = unlist(ExactMatch),

"LeftDeviation" = unlist(AbsStartDelta),

                 # "RightDeviation" = unlist(AbsStopDelta),
                 # "SubjectLeft" = unlist(ExactLeftS),
                 # "SubjectRight" = unlist(ExactRightS),
                 # "QueryLeft" = unlist(ExactLeftQ),
                 # "QueryRight" = unlist(ExactRightQ),
                 "PID" = unlist(Scores),
                 "PIDType" = unlist(IDType),
                 stringsAsFactors = FALSE)
rownames(DF) <- unlist(LabelNames)

} else { DF <- data.frame("TotalCoverage" = unlist(TotalCoverage), "MaxCoverage" = unlist(MaxCoverage), "MinCoverage" = unlist(MinCoverage), "NormDeltaStart" = unlist(NormDeltaStart), "NormDeltaStop" = unlist(NormDeltaStop), "NormGeneDiff" = unlist(NormGeneDiff), "ExactMatch" = unlist(ExactMatch),

"LeftDeviation" = unlist(AbsStartDelta),

                 # "RightDeviation" = unlist(AbsStopDelta),
                 # "SubjectLeft" = unlist(ExactLeftS),
                 # "SubjectRight" = unlist(ExactRightS),
                 # "QueryLeft" = unlist(ExactLeftQ),
                 # "QueryRight" = unlist(ExactRightQ),
                 stringsAsFactors = FALSE)
rownames(DF) <- unlist(LabelNames)

}

You could construct the overlapping DF first and then apply the additional columns in the conditional only. 
And again with choosing the model. The conditional should be the model difference
and then only a single call 

shortened for example

if (Model == "Global") { data("GlobalSelect", envir = environment(), package = "SynExtend") modelData = GlobalSelect else if (Model == "Local") { data("LocalSelect", envir = environment(), package = "SynExtend") modelData = LocalSelect }

KeepSet <- predict(object = modelData, DF, type = response") DF <- cbind(DF,"ModelSelect" = KeepSet >= 0.5)


- [ ] Remove commented out code

_NucleotideOverlap_

- [ ] OutputFormat doesn't change between options? Also have a check for valid
  value of OutFormat at the start of the function.

Links <- NucleotideOverlap(SyntenyObject = Syn,

  • GeneCalls = GeneCalls,
  • LimitIndex = FALSE,
  • Verbose = FALSE) Links 1 2 3 1 1683 Pairs 1760 Pairs 2 609223 Nucleotides 1781 Pairs 3 802650 Nucleotides 623239 Nucleotides
    Links <- NucleotideOverlap(SyntenyObject = Syn,GeneCalls=GeneCalls, LimitIndex=FALSE, Verbose=FALSE, OutputFormat="Comprehensive") Links 1 2 3 1 1794 Genes 1683 Pairs 1760 Pairs 2 609223 Nucleotides 2264 Genes 1781 Pairs 3 802650 Nucleotides 623239 Nucleotides 2216 Genes Links <- NucleotideOverlap(SyntenyObject = Syn,GeneCalls=GeneCalls, LimitIndex=FALSE, Verbose=FALSE, OutputFormat="Sparse") Links 1 2 3 1 1683 Pairs 1760 Pairs 2 609223 Nucleotides 1781 Pairs 3 802650 Nucleotides 623239 Nucleotides

- [ ] Check valid argument values at the start of function.  
- [ ] GeneCalls should be GRange or GRangeList
- [ ] Apply comments to avoid redundant code and to condense where possible to this script as well.

_gffToDataFrame_
- [ ] Remove this entirely and use already implemented Bioconductor function `rtracklayer::import`

Please apply the above corrections and clean up.  When ready make sure there is a new build with a valid version bump and comment back here with a summary of the updates.  

Thank you for your submission to Bioconductor
~ Lori 
npcooley commented 4 years ago

Hi Lori,

Thanks for all the reviewing! I'll get started on your comments as I have time.

I would be happy to make NucleotideOverlap flexible in that it could take in a Genomic Ranges object, but just the wholesale import of the GFF is actually not useful to us. The text parsing that the gffToDateFrame performs is just sort of speculative, but all the parent/id matching and condensing would still have to be performed on the Genomic Ranges object anyway. It made more sense for us to parse it straight from the original file.

I.e. a naive user would be just fine pulling the gene bounds from a gff generated by Prodigal or Prokka using a rtracklayer::import because those programs don't think about genes with complex bounds, but if I pull a generic refseq or genbank gff, it will need additional condensing to drop redundant bound sets (such as were the same bound is marked for "gene", "mrna", "exon", and "cds"), and to correctly think about the coding space for cases where multiple exons exist.

Keeping a condensed format is useful for me internally because it allows me to correctly reconcile CDS and gene bounds in a single place, and keep track of indices in the source FASTA file. It seems useful for a naive user as well because the data format is just simpler.

Once again I'm happy to make it flexible, and if GRanges objects can support ragged ranges for gene positions I can convert my output to that, but those are changes I would prefer to implement in a later release if possible.

Cheers, Nick

lshep commented 4 years ago

In Bioconductor the recommended and almost always required structure for data involving genomic position information is GenomicRanges in either a GRanges or GRangesList object. Users that have been using Bioconductor pipelines and workflows will expect this and at minimum expect the function to be able to handle this type of object. There is also some consideration of how your parser handles the range and range intervals. Bioconductor standards for ranges are 1-based, closed intervals and this would have to be implemented so there is no discrepancy from standard usage throughout Bioconductor; this is the reasoning for using the rtracklayer::import function as it takes all of this into consideration when doing the parsing; it is a fast and developed function that would guarantee correctness. It seems like your other manipulations (parent/id matching, etc) and additions of information can be performed on the imported GenomicRanges object and this would be preferred. The GenomicRanges package has been highly developed over the years, as a very mature package, and has many useful functionalities that could prove useful in development. Please update the package in the current version before acceptance; in our experience promises of updating in future implementations never come to fruition.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

13b5d14 man and r code comments, GRange functionality incl...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bfa4380 version

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

18c7b85 Warnings

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

b6da027 removed R version dependency

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

402182c R version readded

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

eb70704 man edits

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

4143df3 Vignette edits

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

1c7d13a vignette edits

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

c611677 more vignette edits

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

lshep commented 4 years ago

Is the package ready for a re-review? If so please summarize changes that have been made here.

lshep commented 4 years ago

Please remember all packages must be reviewed and accepted by april 22 to be included in the 3.11 release else they will be in devel until the released in fall under bioc 3.12

npcooley commented 4 years ago

Ohp, I missed your earlier notification, sorry! Yes, the package is ready for review.

npcooley commented 4 years ago

Hi,

I know this is cuttiing it close to the deadline, but is it possible to get a review back for this package today?

Cheers, Nick

On Thu, Apr 16, 2020 at 9:51 AM lshep notifications@github.com wrote:

Please remember all packages must be reviewed and accepted by april 22 to be included in the 3.11 release else they will be in devel until the released in fall under bioc 3.12

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1369#issuecomment-614665887, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMES4WOEYFG2ZIGNYYEXZLRM4EMTANCNFSM4KU2LNPQ .