Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

Proposed task for Outreachy applicants: Forge BSgenome data package for UCSC genome canFam6 #38

Closed hpages closed 1 year ago

hpages commented 1 year ago

This task depends on this issue being completed first (i.e. PR accepted and merged, and issue closed). Although it's not a requirement that the 2 tasks be completed by the same applicant, it will be a more interesting learning experience if they are.

BSgenome data packages are one of the many types of annotation packages available in Bioconductor. They contain the genomic sequences, which comprise chromosome sequences and other DNA sequences, of a particular genome assembly for a given organism. For example BSgenome.Hsapiens.UCSC.hg19 is a BSgenome data package that contains the genomic sequences of the hg19 genome from UCSC. Users can easily and efficiently access the sequences, or portions of the sequences, stored in these packages, via a common API implemented in the BSgenome software package.

This task's goal is to make a new BSgenome data package for UCSC genome canFam6. The process of making such package is documented in the "How to forge a BSgenome data package" vignette from the BSgenome software package. The landing page for the BSgenome package contains a link to this vignette.

Other useful links:

IMPORTANT NOTES TO OUTREACHY APPLICANTS:

Priceless-P commented 1 year ago

Hi @hpages Please can I work on this issue?

hpages commented 1 year ago

Hi @Priceless-P I just assigned you. It's all yours now!

Priceless-P commented 1 year ago

@hpages I think I have been able to make it work! I finally got forgeBSgenomeDataPkg(), R CMD build R CMD check, and R CMD install to all run with no errors. Apart from a few errors I encountered earlier which I found most of the solutions on Bioconductor support page, this vignette you wrote was everything I needed. I also had to look at other seed files to further deepen my understanding.

So far, everything seems okay to me, but I have just one little question. I created the project on my local machine. The location is /Users/prisca/Desktop/. The folders generated are BSgenome.Cfamiliaris.UCSC.canFam6/, and canFam6/ (where each FASTA file for each chromosome is located). The files are canFam6-seed which is the seed file and the tarball file usually generated by R CMD build.

My question is what folders and files should I upload to my fork of the repo and in what location? I'm guessing I need to upload BSgenome.Cfamiliaris.UCSC.canFam6/ and canFam6-seed to BSgenome/inst/extdata/GentlemanLab/ but I'm not sure

hpages commented 1 year ago

Hi @Priceless-P,

Sounds like you did a really good job at digging around and finding all the information you needed. Congrats!

If you are confident that your BSgenome data package works as expected, please add your seed file to the inst/extdata/Outreachy/ folder of the BSgenome software package. You'll need to fork the BSgenome repository for that, then add the seed file, commit, push, and submit a PR. (I just edited the IMPORTANT NOTES TO OUTREACHY APPLICANTS above to add these steps.)

Thanks, H.

PS: Surprise!! https://bioconductor.org/packages/devel/GenomeInfoDb :partying_face:

Priceless-P commented 1 year ago

PS: Surprise!! https://bioconductor.org/packages/devel/GenomeInfoDb 🥳

Wow!!! I'm so honored 💃 I'm putting this up on my LinkedIn! Thank you so much @hpages

Sounds like you did a really good job at digging around and finding all the information you needed. Congrats!

Thank you, Wasn't so hard. Thanks to you, the support is amazing!

If you are confident that your BSgenome data package works as expected, please add your seed file to the inst/extdata/Outreachy/ folder of the BSgenome software package. You'll need to fork the BSgenome repository for that, then add the seed file, commit, push, and submit a PR. (I just edited the IMPORTANT NOTES TO OUTREACHY APPLICANTS above to add these steps.)

Okay, I just created a pull request. Please take a look.

hpages commented 1 year ago

Hi @Priceless-P,

I just merged PR #46.

Don't miss my long due explanation about PkgExamples: https://github.com/Bioconductor/BSgenome/pull/46#issuecomment-1291424086 Don't hesitate to ask if you have any questions.

Next task in your group is #39. It's still about Dog! :dog2: Whenever you are ready, go there and ask to be assigned.

Don't forget to record your contributions on Outreachy at https://www.outreachy.org/outreachy-december-2022-internship-round/communities/bioconductor/refactor-the-bsgenomeforge-tools/contributions/.

Priceless-P commented 1 year ago

Sure.

Thanks @hpages