Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

Sscore2 submission #1190

Closed harrisgm closed 5 years ago

harrisgm commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @harrisgm

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: Sscore2
Type: Package
Title: Sscore2: an R package for microarray analysis for
        Affymetrix/Thermo Fisher arrays
Version: 1.1.0
Author: Guy M. Harris & Shahroze Abbas
Maintainer: Guy M. Harris <harrisgm@vcu.edu>
Description: For differential expression analysis of WT-style microarrays from Affymetrix/Thermo-Fisher.  Based on S-score algorithm originally described by Zhang et al 2002.
License: GPL (>=3)
Encoding: UTF-8
Depends: dplR, R.utils, methods, affxparser, graphics, stats, R (>=
        3.3), data.table (>= 1.12)
Imports:
biocViews: DifferentialExpression, Microarray, OneChannel,
        ProprietaryPlatforms, DataImport
LazyData: FALSE
NeedsCompilation: no
Packaged: 2019-07-22 18:28:28 UTC; guyharris
bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

LiNk-NY commented 5 years ago

Hi Guy, @harrisgm

Thank you for your submission. Unfortunately, we are unable to accept your package in its current form. Please see the review below and respond to the pertinent comments.

Best regards, Marcel


Sscore2 #1190

DESCRIPTION

NAMESPACE

Comments before further review

LiNk-NY commented 5 years ago

Hi @harrisgm Please provide an update / response to the comments in the review otherwise, I'm force to close it. Thank you.

harrisgm commented 5 years ago

Hi Marcel Ramos,

I'm very sorry for the delayed response, I somehow missed this email on August 7th.

I have been working on the failed build issues and simplifying the annotation structure of the data.

In response to the reviewer comments:

  1. I can resolve the "Description" and "Namespace" issues this evening.
  2. The Sscore2 package is designed to work efficiently with ClariomD/MTA arrays to provide exon-level and gene-level differential expression using probe-level data and GC-based background corrections. The package is entirely new in terms of code and operation, it just expands on the mathematical principles established in the first package. Analysis tools for HTA/MTA/ClariomD and ClariomS (arrays that are still sold and used) are lacking and this packages fills that void. And it allows for more complex and fine tuned dissection of the different PSR/TC IDs types included in the HTA/MTA/ClariomD (see "locus_type" discussion below and my needs for additional probe-level annotation information)
  3. We could certainly consider renaming the package if necessary.
  4. The most important consideration is the use of our "probefiles", which are efficient tables that are generated from Affymetrix CDF (for ClariomD) or PGF (for ClariomS) files (publicly available on both Affymetrix's site and Thermo-Fishers). We need to have the GC.count of every probe included in the probe level data. We also use the Affymetrix na36 annotation data to get the "locus_type" information that provided for the probesetIDs or transcriptClusterIDs, and include it in the probe-level data. This allows the user to select which PSRid/TCid types are included in the analysis (since there are more noisy "noncoding" PSRids/TCids than there are "Coding" IDs on the ClariomD arrays). We also made annotation files with Affymetrix na36 data.

I can easily change the annotation assignments to use the annotation.db packages provided in bioconductor.

The probe-level data is more difficult since I do not believe there is an annotation resource on Bioconductor that includes the "locus_type" info or the "GC count" info for each individual probe.

Should I submit my affymetrix generated probe-level datafiles as annotation pacakges in order to solve this issue?

Sorry again for the delayed response. Thank you for your comments and possible suggestions. -Guy

On Wed, Aug 28, 2019 at 12:46 PM Marcel Ramos notifications@github.com wrote:

Hi @harrisgm https://github.com/harrisgm Please provide an update / response to the comments in the review otherwise, I'm force to close it. Thank you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1190?email_source=notifications&email_token=AK53ORN23ML65XP7JF5XILDQG2TUZA5CNFSM4IG44FS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LX6VY#issuecomment-525827927, or mute the thread https://github.com/notifications/unsubscribe-auth/AK53ORKZHVLL3FYIQWLXDITQG2TUZANCNFSM4IG44FSQ .

harrisgm commented 5 years ago

To follow up with my earlier comments regarding to the package's use of custom annotation files:

I have determined that the annotation package "pd.mta.1.0" contains the "locus_type" information that is desired. This only leaves the probe sequence information that is needed to determine each probe's GC count (for bkg-correction).

If bioconductor had 'probe sequence' packages for the ClariomD/S style of arrays, I would be able to run the package without the need for any outside annotation files.

I would need packages like this example for mouse gene style arrays: https://www.bioconductor.org/packages/release/data/annotation/html/mogene10stv1probe.html )

I can create the probe sequence annotation data packages as described above using "AnnotationForge" and the affymetrix "probe_tab" files.

If I create these probe sequence annotation packages, will they be easy to upload to Bioconductor's annotation data repository?

I will be able to address all package dependencies on custom annotation files if "probe sequence" annotation packages are added as well.

Thanks for your time. -Guy

I can make our package work using bioconductor annotation packages

On Wed, Aug 28, 2019 at 4:24 PM Guy Harris harrisgm@vcu.edu wrote:

Hi Marcel Ramos,

I'm very sorry for the delayed response, I somehow missed this email on August 7th.

I have been working on the failed build issues and simplifying the annotation structure of the data.

In response to the reviewer comments:

  1. I can resolve the "Description" and "Namespace" issues this evening.
  2. The Sscore2 package is designed to work efficiently with ClariomD/MTA arrays to provide exon-level and gene-level differential expression using probe-level data and GC-based background corrections. The package is entirely new in terms of code and operation, it just expands on the mathematical principles established in the first package. Analysis tools for HTA/MTA/ClariomD and ClariomS (arrays that are still sold and used) are lacking and this packages fills that void. And it allows for more complex and fine tuned dissection of the different PSR/TC IDs types included in the HTA/MTA/ClariomD (see "locus_type" discussion below and my needs for additional probe-level annotation information)
  3. We could certainly consider renaming the package if necessary.
  4. The most important consideration is the use of our "probefiles", which are efficient tables that are generated from Affymetrix CDF (for ClariomD) or PGF (for ClariomS) files (publicly available on both Affymetrix's site and Thermo-Fishers). We need to have the GC.count of every probe included in the probe level data. We also use the Affymetrix na36 annotation data to get the "locus_type" information that provided for the probesetIDs or transcriptClusterIDs, and include it in the probe-level data. This allows the user to select which PSRid/TCid types are included in the analysis (since there are more noisy "noncoding" PSRids/TCids than there are "Coding" IDs on the ClariomD arrays). We also made annotation files with Affymetrix na36 data.

I can easily change the annotation assignments to use the annotation.db packages provided in bioconductor.

The probe-level data is more difficult since I do not believe there is an annotation resource on Bioconductor that includes the "locus_type" info or the "GC count" info for each individual probe.

Should I submit my affymetrix generated probe-level datafiles as annotation pacakges in order to solve this issue?

Sorry again for the delayed response. Thank you for your comments and possible suggestions. -Guy

On Wed, Aug 28, 2019 at 12:46 PM Marcel Ramos notifications@github.com wrote:

Hi @harrisgm https://github.com/harrisgm Please provide an update / response to the comments in the review otherwise, I'm force to close it. Thank you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1190?email_source=notifications&email_token=AK53ORN23ML65XP7JF5XILDQG2TUZA5CNFSM4IG44FS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LX6VY#issuecomment-525827927, or mute the thread https://github.com/notifications/unsubscribe-auth/AK53ORKZHVLL3FYIQWLXDITQG2TUZANCNFSM4IG44FSQ .

LiNk-NY commented 5 years ago

Hi Guy, @harrisgm I'm no expert in probe sequences. I would rather refer you to Jim @jmacdon. He would likely be able to answer your question.

Best regards, Marcel

jmacdon commented 5 years ago

The only question I see here is whether or not @harrisgm can upload the probe files to the annotation repository. This is where all the other probe packages reside, so I don't really see a problem with doing so. An alternative would be to put the probe packages on the AnnotationForge repository. @lshep @mtmorgan @dvantwisk Any druthers as to where these should go?

LiNk-NY commented 5 years ago

I'll let others chime in..

I have one question though: Would it be possible to coordinate with the maintainer of pd.mta.1.0 and include GC counts for probes already in the the data package?

-MR

jmacdon commented 5 years ago

It could be done, but the bgp file which contains the probe sequences isn't currently parsed by pdInfoBuilder. So to do that would require someone to write a parser, generate the GC counts, and then add a table to the pdInfoPackage that is unrelated to anything that oligo does.

Which seems unlikely to happen, given the incentives. I could envision a different scenario where @harrisgm generates a data package that contains the GC content for all the probes for all the Affy arrays that Sscore2 supports. That way there is no reliance on others to do things to support Sscore2, and the data package would probably be pretty light, and once built probably never need updating.

LiNk-NY commented 5 years ago

Thanks for your expert input Jim @jmacdon It looks like the second option is best. The question of whether to place the probe files in the annotation repository or to create a probe package with AnnotationForge still remains... @mtmorgan @lshep @dvantwisk

LiNk-NY commented 5 years ago

Hi HarrisGM @harrisgm I've spoken with Lori and Martin and it seems like AnnotationHub submission would be the ideal option. Please see the guidance here: https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/CreateAnAnnotationPackage.html

Feel free to ask questions on this thread.

Best, Marcel

harrisgm commented 5 years ago

Dear all, Thank you all very much for looking into this for me.

I began by using AnnotationForge to create a "mta10probe" package from the probe sequences in "probe_tab" file for the MTA-1.0 arrays. ( https://github.com/harrisgm/mta10probe) However, the probe_tab file does not contain probe sequence information for all probes on the chip.

Much to my embarrassment, I finally found out how to access the probe sequence information contained in "pd.mta.1.0". So this package does in fact have everything I need to generate probeFiles for the Sscore2 application. I also have been able to easily add the "hta2.0" functionality to the code using the "pd.hta.2.0" package.

So, I was incorrect when I said I needed annotations outside of what BioConductor provides. I am sorry for this mistake.

However, I do still need to generate 'probeFile' data.tables for the Sscore2 from the bioconductor sources. I have written a function to create and save these probeFiles as .rda files.

Is it acceptable to save these .rda files to the /data/ directory of the already installed R Package? The probeFiles would be generated as needed for the given array/chip type and then retroactively placed in the R package installation 'data' directory. This is easy enough when the package is installed in the user library.

OR... should I use annotationHub to upload these probeFiles as annotation packages on Bioconductor? All information in the probeFile is sourced from exclusively from the "pd.mta.1.0", "pd.hta.2.0", etc.

Thanks again for everyone's help. -Guy

On Wed, Sep 4, 2019 at 1:51 PM Marcel Ramos notifications@github.com wrote:

Hi HarrisGM @harrisgm https://github.com/harrisgm I've spoken with Lori and Martin and it seems like AnnotationHub submission would be the ideal option. Please see the guidance here: https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/CreateAnAnnotationPackage.html

Feel free to ask questions on this thread.

Best, Marcel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1190?email_source=notifications&email_token=AK53OROWKLD4BEHO22PISGLQH7YQFA5CNFSM4IG44FS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54NBNI#issuecomment-528011445, or mute the thread https://github.com/notifications/unsubscribe-auth/AK53ORNA5I437JKVTFJISLDQH7YQFANCNFSM4IG44FSQ .

harrisgm commented 5 years ago

Dear Bioconductor,

Regarding package submission #1190 (Sscore2): The package has been resubmitted, due to size issues, as: submission #1274 (GCSscore)

This is Guy Harris again and I do believe I have finally corrected the major issues with package submission #1190 (original title: "Sscore2")

Due to .git pack tracking of the previous large 'probeFiles' that I included in "extdata", I removed the originally submitted package from github yesterday.

I am re-submitting my package from a fresh repository, and it is now titled "GCSscore" instead of "Sscore2".

A list of major changes:

  1. all "probeFiles" for running the GCSscore algorithm are now generated "on the fly" using only Bioconductor packages: platform design (pd) package AND annotation .db files. a) Ultimately, the "probeFile" packages are created using makeProbePackage() from 'AnnotationForge', along with a modified versions getProbeDataAffy(), depending on the chip technology. b) The 'probeFile' packages that are generated for individual chip types could be uploaded as Annotation packages as well.

  2. The issues "exports" and "imports" and "depends" have been corrected.

  3. It passes all R CMD check (1 NOTE for 11Mb package size) and R CMD biocCheck (no ERRORS or WARNINGS)

  4. I am adding the web hook to this new package submission my additional edits can be viewed by you guys.

Hopefully, the resubmission of the package isn't a problem!

Thanks for all the advice, it really helped me improve this package. -Guy

On Thu, Sep 5, 2019 at 1:47 PM Guy Harris harrisgm@vcu.edu wrote:

Dear all, Thank you all very much for looking into this for me.

I began by using AnnotationForge to create a "mta10probe" package from the probe sequences in "probe_tab" file for the MTA-1.0 arrays. ( https://github.com/harrisgm/mta10probe) However, the probe_tab file does not contain probe sequence information for all probes on the chip.

Much to my embarrassment, I finally found out how to access the probe sequence information contained in "pd.mta.1.0". So this package does in fact have everything I need to generate probeFiles for the Sscore2 application. I also have been able to easily add the "hta2.0" functionality to the code using the "pd.hta.2.0" package.

So, I was incorrect when I said I needed annotations outside of what BioConductor provides. I am sorry for this mistake.

However, I do still need to generate 'probeFile' data.tables for the Sscore2 from the bioconductor sources. I have written a function to create and save these probeFiles as .rda files.

Is it acceptable to save these .rda files to the /data/ directory of the already installed R Package? The probeFiles would be generated as needed for the given array/chip type and then retroactively placed in the R package installation 'data' directory. This is easy enough when the package is installed in the user library.

OR... should I use annotationHub to upload these probeFiles as annotation packages on Bioconductor? All information in the probeFile is sourced from exclusively from the "pd.mta.1.0", "pd.hta.2.0", etc.

Thanks again for everyone's help. -Guy

On Wed, Sep 4, 2019 at 1:51 PM Marcel Ramos notifications@github.com wrote:

Hi HarrisGM @harrisgm https://github.com/harrisgm I've spoken with Lori and Martin and it seems like AnnotationHub submission would be the ideal option. Please see the guidance here: https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/CreateAnAnnotationPackage.html

Feel free to ask questions on this thread.

Best, Marcel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1190?email_source=notifications&email_token=AK53OROWKLD4BEHO22PISGLQH7YQFA5CNFSM4IG44FS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54NBNI#issuecomment-528011445, or mute the thread https://github.com/notifications/unsubscribe-auth/AK53ORNA5I437JKVTFJISLDQH7YQFANCNFSM4IG44FSQ .