Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
31 stars 13 forks source link

Proposed contribution task for Outreachy applicants: Register UCSC genome gorGor6 #60

Closed hpages closed 1 year ago

hpages commented 2 years ago

gorGor6 is the latest UCSC genome for Gorilla (Gorilla gorilla gorilla). See "List of UCSC genome releases" at https://genome.ucsc.edu/FAQ/FAQreleases.html for all the genomes currently supported by UCSC.

Also check out the "Genome Browser Gateway" page here. This is the main entrance to the "UCSC Genome Browser". Find Gorilla in the UCSC species tree on the left, click on it, then make sure to select the latest Gorilla Assembly (gorGor6). This will display a bunch of additional information about the gorGor6 assembly.

Note that many UCSC genomes are already registered in the GenomeInfoDb package (83 as of October 2022). The registered_UCSC_genomes() function in GenomeInfoDb returns the list of all the UCSC genomes that are currently registered in the package. An important thing to be aware of is that getChromInfoFromUCSC() still works on an unregistered genome, but in "degraded" mode, that is:

Registering a genome fixes that. In other words, once a genome is registered in GenomeInfoDb, the information returned by getChromInfoFromUCSC() for that genome is guaranteed to be complete and accurate.

See ?getChromInfoFromUCSC (after loading GenomeInfoDb) for more information.

Registering a new UCSC genome is only a matter of adding a new file, called "registration file", to GenomeInfoDb/inst/registered/UCSC_genomes/. Note that the folder contains a README.TXT file that provides some brief information about what a "registration file" should contain (unfortunately the registration process is not fully documented).

For gorGor6, since this is the first gorGor genome that we're going to register in GenomeInfoDb, we need to start the gorGor6.R file from scratch. However, looking at other registration files to get a feeling of how things are done is always a good idea. Don't bother with the NCBI_LINKER component for now. We'll add it later, once the corresponding NCBI assembly (Kamilah_GGO_v0) is also registered (registering Kamilah_GGO_v0 is the topic of issue #61).

IMPORTANT NOTES TO OUTREACHY APPLICANTS:

hpages commented 2 years ago

I have completed all the preliminary tasks

Great!

I just assigned you to this issue. Do not hesitate to ask questions, I'll do my best to help.

Priceless-P commented 2 years ago

Hi @hpages, Can I work on this issue once I'm done with this ?

hpages commented 1 year ago

@kakopo PR #88 merged, thanks!